MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, Victoire Memdjokam Koagne, Allahsera Auguste Tapo, Tebogo Macucwa, Vukosi Marivate, Mboning Tchiaze Elvis, Tajuddeen Gwadabe, Tosin Adewumi, Orevaoghene Ahia, Joyce Nakatumba-Nabende
Abstract
African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.- Anthology ID:
- 2022.emnlp-main.298
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4488–4508
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.298
- DOI:
- 10.18653/v1/2022.emnlp-main.298
- Bibkey:
- Cite (ACL):
- David Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, et al.. 2022. MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4488–4508, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition (Adelani et al., EMNLP 2022)
- Copy Citation:
- PDF:
- https://aclanthology.org/2022.emnlp-main.298.pdf
Export citation
@inproceedings{adelani-etal-2022-masakhaner, title = "{M}asakha{NER} 2.0: {A}frica-centric Transfer Learning for Named Entity Recognition", author = "Adelani, David and Neubig, Graham and Ruder, Sebastian and Rijhwani, Shruti and Beukman, Michael and Palen-Michel, Chester and Lignos, Constantine and Alabi, Jesujoba and Muhammad, Shamsuddeen and Nabende, Peter and Dione, Cheikh M. Bamba and Bukula, Andiswa and Mabuya, Rooweither and Dossou, Bonaventure F. P. and Sibanda, Blessing and Buzaaba, Happy and Mukiibi, Jonathan and Kalipe, Godson and Mbaye, Derguene and Taylor, Amelia and Kabore, Fatoumata and Emezue, Chris Chinenye and Aremu, Anuoluwapo and Ogayo, Perez and Gitau, Catherine and Munkoh-Buabeng, Edwin and Memdjokam Koagne, Victoire and Tapo, Allahsera Auguste and Macucwa, Tebogo and Marivate, Vukosi and Elvis, Mboning Tchiaze and Gwadabe, Tajuddeen and Adewumi, Tosin and Ahia, Orevaoghene and Nakatumba-Nabende, Joyce", editor = "Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-main.298", doi = "10.18653/v1/2022.emnlp-main.298", pages = "4488--4508", abstract = "African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14{\%} over 20 languages as compared to using English.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="adelani-etal-2022-masakhaner"> <titleInfo> <title>MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition</title> </titleInfo> <name type="personal"> <namePart type="given">David</namePart> <namePart type="family">Adelani</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Graham</namePart> <namePart type="family">Neubig</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sebastian</namePart> <namePart type="family">Ruder</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shruti</namePart> <namePart type="family">Rijhwani</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Michael</namePart> <namePart type="family">Beukman</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chester</namePart> <namePart type="family">Palen-Michel</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Constantine</namePart> <namePart type="family">Lignos</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jesujoba</namePart> <namePart type="family">Alabi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shamsuddeen</namePart> <namePart type="family">Muhammad</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Peter</namePart> <namePart type="family">Nabende</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Cheikh</namePart> <namePart type="given">M</namePart> <namePart type="given">Bamba</namePart> <namePart type="family">Dione</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Andiswa</namePart> <namePart type="family">Bukula</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rooweither</namePart> <namePart type="family">Mabuya</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bonaventure</namePart> <namePart type="given">F</namePart> <namePart type="given">P</namePart> <namePart type="family">Dossou</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Blessing</namePart> <namePart type="family">Sibanda</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Happy</namePart> <namePart type="family">Buzaaba</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jonathan</namePart> <namePart type="family">Mukiibi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Godson</namePart> <namePart type="family">Kalipe</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Derguene</namePart> <namePart type="family">Mbaye</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Amelia</namePart> <namePart type="family">Taylor</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Fatoumata</namePart> <namePart type="family">Kabore</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chris</namePart> <namePart type="given">Chinenye</namePart> <namePart type="family">Emezue</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anuoluwapo</namePart> <namePart type="family">Aremu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Perez</namePart> <namePart type="family">Ogayo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Catherine</namePart> <namePart type="family">Gitau</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Edwin</namePart> <namePart type="family">Munkoh-Buabeng</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Victoire</namePart> <namePart type="family">Memdjokam Koagne</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Allahsera</namePart> <namePart type="given">Auguste</namePart> <namePart type="family">Tapo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tebogo</namePart> <namePart type="family">Macucwa</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vukosi</namePart> <namePart type="family">Marivate</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mboning</namePart> <namePart type="given">Tchiaze</namePart> <namePart type="family">Elvis</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tajuddeen</namePart> <namePart type="family">Gwadabe</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tosin</namePart> <namePart type="family">Adewumi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Orevaoghene</namePart> <namePart type="family">Ahia</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Joyce</namePart> <namePart type="family">Nakatumba-Nabende</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-12</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</title> </titleInfo> <name type="personal"> <namePart type="given">Yoav</namePart> <namePart type="family">Goldberg</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zornitsa</namePart> <namePart type="family">Kozareva</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yue</namePart> <namePart type="family">Zhang</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Abu Dhabi, United Arab Emirates</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.</abstract> <identifier type="citekey">adelani-etal-2022-masakhaner</identifier> <identifier type="doi">10.18653/v1/2022.emnlp-main.298</identifier> <location> <url>https://aclanthology.org/2022.emnlp-main.298</url> </location> <part> <date>2022-12</date> <extent unit="page"> <start>4488</start> <end>4508</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition %A Adelani, David %A Neubig, Graham %A Ruder, Sebastian %A Rijhwani, Shruti %A Beukman, Michael %A Palen-Michel, Chester %A Lignos, Constantine %A Alabi, Jesujoba %A Muhammad, Shamsuddeen %A Nabende, Peter %A Dione, Cheikh M. Bamba %A Bukula, Andiswa %A Mabuya, Rooweither %A Dossou, Bonaventure F. P. %A Sibanda, Blessing %A Buzaaba, Happy %A Mukiibi, Jonathan %A Kalipe, Godson %A Mbaye, Derguene %A Taylor, Amelia %A Kabore, Fatoumata %A Emezue, Chris Chinenye %A Aremu, Anuoluwapo %A Ogayo, Perez %A Gitau, Catherine %A Munkoh-Buabeng, Edwin %A Memdjokam Koagne, Victoire %A Tapo, Allahsera Auguste %A Macucwa, Tebogo %A Marivate, Vukosi %A Elvis, Mboning Tchiaze %A Gwadabe, Tajuddeen %A Adewumi, Tosin %A Ahia, Orevaoghene %A Nakatumba-Nabende, Joyce %Y Goldberg, Yoav %Y Kozareva, Zornitsa %Y Zhang, Yue %S Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing %D 2022 %8 December %I Association for Computational Linguistics %C Abu Dhabi, United Arab Emirates %F adelani-etal-2022-masakhaner %X African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English. %R 10.18653/v1/2022.emnlp-main.298 %U https://aclanthology.org/2022.emnlp-main.298 %U https://doi.org/10.18653/v1/2022.emnlp-main.298 %P 4488-4508
Markdown (Informal)
[MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition](https://aclanthology.org/2022.emnlp-main.298) (Adelani et al., EMNLP 2022)
- MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition (Adelani et al., EMNLP 2022)
ACL
- David Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, et al.. 2022. MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4488–4508, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.