An English-Swahili parallel corpus and its use for neural machine translation in the news domain
Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, Julie Wall
Abstract
This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.- Anthology ID:
- 2020.eamt-1.32
- Volume:
- Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Lisboa, Portugal
- Editors:
- André Martins, Helena Moniz, Sara Fumega, Bruno Martins, Fernando Batista, Luisa Coheur, Carla Parra, Isabel Trancoso, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof, Mary Nurminen, Lena Marg, Mikel L. Forcada
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 299–308
- Language:
- URL:
- https://aclanthology.org/2020.eamt-1.32
- DOI:
- Bibkey:
- Cite (ACL):
- Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, and Julie Wall. 2020. An English-Swahili parallel corpus and its use for neural machine translation in the news domain. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 299–308, Lisboa, Portugal. European Association for Machine Translation.
- Cite (Informal):
- An English-Swahili parallel corpus and its use for neural machine translation in the news domain (Sánchez-Martínez et al., EAMT 2020)
- Copy Citation:
- PDF:
- https://aclanthology.org/2020.eamt-1.32.pdf
Export citation
@inproceedings{sanchez-martinez-etal-2020-english, title = "An {E}nglish-{S}wahili parallel corpus and its use for neural machine translation in the news domain", author = "S{\'a}nchez-Mart{\'\i}nez, Felipe and S{\'a}nchez-Cartagena, V{\'\i}ctor M. and P{\'e}rez-Ortiz, Juan Antonio and Forcada, Mikel L. and Espl{\`a}-Gomis, Miquel and Secker, Andrew and Coleman, Susie and Wall, Julie", editor = "Martins, Andr{\'e} and Moniz, Helena and Fumega, Sara and Martins, Bruno and Batista, Fernando and Coheur, Luisa and Parra, Carla and Trancoso, Isabel and Turchi, Marco and Bisazza, Arianna and Moorkens, Joss and Guerberof, Ana and Nurminen, Mary and Marg, Lena and Forcada, Mikel L.", booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation", month = nov, year = "2020", address = "Lisboa, Portugal", publisher = "European Association for Machine Translation", url = "https://aclanthology.org/2020.eamt-1.32", pages = "299--308", abstract = "This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="sanchez-martinez-etal-2020-english"> <titleInfo> <title>An English-Swahili parallel corpus and its use for neural machine translation in the news domain</title> </titleInfo> <name type="personal"> <namePart type="given">Felipe</namePart> <namePart type="family">Sánchez-Martínez</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Víctor</namePart> <namePart type="given">M</namePart> <namePart type="family">Sánchez-Cartagena</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Juan</namePart> <namePart type="given">Antonio</namePart> <namePart type="family">Pérez-Ortiz</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mikel</namePart> <namePart type="given">L</namePart> <namePart type="family">Forcada</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Miquel</namePart> <namePart type="family">Esplà-Gomis</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Andrew</namePart> <namePart type="family">Secker</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Susie</namePart> <namePart type="family">Coleman</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Julie</namePart> <namePart type="family">Wall</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2020-11</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 22nd Annual Conference of the European Association for Machine Translation</title> </titleInfo> <name type="personal"> <namePart type="given">André</namePart> <namePart type="family">Martins</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Helena</namePart> <namePart type="family">Moniz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sara</namePart> <namePart type="family">Fumega</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bruno</namePart> <namePart type="family">Martins</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Fernando</namePart> <namePart type="family">Batista</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Luisa</namePart> <namePart type="family">Coheur</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Carla</namePart> <namePart type="family">Parra</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Isabel</namePart> <namePart type="family">Trancoso</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marco</namePart> <namePart type="family">Turchi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Arianna</namePart> <namePart type="family">Bisazza</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Joss</namePart> <namePart type="family">Moorkens</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ana</namePart> <namePart type="family">Guerberof</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mary</namePart> <namePart type="family">Nurminen</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lena</namePart> <namePart type="family">Marg</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mikel</namePart> <namePart type="given">L</namePart> <namePart type="family">Forcada</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>European Association for Machine Translation</publisher> <place> <placeTerm type="text">Lisboa, Portugal</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET.</abstract> <identifier type="citekey">sanchez-martinez-etal-2020-english</identifier> <location> <url>https://aclanthology.org/2020.eamt-1.32</url> </location> <part> <date>2020-11</date> <extent unit="page"> <start>299</start> <end>308</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T An English-Swahili parallel corpus and its use for neural machine translation in the news domain %A Sánchez-Martínez, Felipe %A Sánchez-Cartagena, Víctor M. %A Pérez-Ortiz, Juan Antonio %A Forcada, Mikel L. %A Esplà-Gomis, Miquel %A Secker, Andrew %A Coleman, Susie %A Wall, Julie %Y Martins, André %Y Moniz, Helena %Y Fumega, Sara %Y Martins, Bruno %Y Batista, Fernando %Y Coheur, Luisa %Y Parra, Carla %Y Trancoso, Isabel %Y Turchi, Marco %Y Bisazza, Arianna %Y Moorkens, Joss %Y Guerberof, Ana %Y Nurminen, Mary %Y Marg, Lena %Y Forcada, Mikel L. %S Proceedings of the 22nd Annual Conference of the European Association for Machine Translation %D 2020 %8 November %I European Association for Machine Translation %C Lisboa, Portugal %F sanchez-martinez-etal-2020-english %X This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet. We report the results of a pilot human evaluation performed by the news media organisations participating in the H2020 EU-funded project GoURMET. %U https://aclanthology.org/2020.eamt-1.32 %P 299-308
Markdown (Informal)
[An English-Swahili parallel corpus and its use for neural machine translation in the news domain](https://aclanthology.org/2020.eamt-1.32) (Sánchez-Martínez et al., EAMT 2020)
- An English-Swahili parallel corpus and its use for neural machine translation in the news domain (Sánchez-Martínez et al., EAMT 2020)
ACL
- Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, and Julie Wall. 2020. An English-Swahili parallel corpus and its use for neural machine translation in the news domain. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 299–308, Lisboa, Portugal. European Association for Machine Translation.