@inproceedings{dione-etal-2023-masakhapos,
title = "{M}asakha{POS}: Part-of-Speech Tagging for Typologically Diverse {A}frican languages",
author = "Dione, Cheikh M. Bamba and
Adelani, David Ifeoluwa and
Nabende, Peter and
Alabi, Jesujoba and
Sindane, Thapelo and
Buzaaba, Happy and
Muhammad, Shamsuddeen Hassan and
Emezue, Chris Chinenye and
Ogayo, Perez and
Aremu, Anuoluwapo and
Gitau, Catherine and
Mbaye, Derguene and
Mukiibi, Jonathan and
Sibanda, Blessing and
Dossou, Bonaventure F. P. and
Bukula, Andiswa and
Mabuya, Rooweither and
Tapo, Allahsera Auguste and
Munkoh-Buabeng, Edwin and
Memdjokam Koagne, Victoire and
Ouoba Kabore, Fatoumata and
Taylor, Amelia and
Kalipe, Godson and
Macucwa, Tebogo and
Marivate, Vukosi and
Gwadabe, Tajuddeen and
Elvis, Mboning Tchiaze and
Onyenwe, Ikechukwu and
Atindogbe, Gratien and
Adelani, Tolulope and
Akinade, Idris and
Samuel, Olanrewaju and
Nahimana, Marien and
Musabeyezu, Th{\'e}og{\`e}ne and
Niyomutabazi, Emile and
Chimhenga, Ester and
Gotosa, Kudzai and
Mizha, Patrick and
Agbolo, Apelete and
Traore, Seydou and
Uchechukwu, Chinedu and
Yusuf, Aliyu and
Abdullahi, Muhammad and
Klakow, Dietrich",
editor = "Rogers, Anna and
Boyd-Graber, Jordan and
Okazaki, Naoaki",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.609/",
doi = "10.18653/v1/2023.acl-long.609",
pages = "10883--10900",
abstract = "In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages."
}
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="dione-etal-2023-masakhapos">
<titleInfo>
<title>MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages</title>
</titleInfo>
<name type="personal">
<namePart type="given">Cheikh</namePart>
<namePart type="given">M</namePart>
<namePart type="given">Bamba</namePart>
<namePart type="family">Dione</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">David</namePart>
<namePart type="given">Ifeoluwa</namePart>
<namePart type="family">Adelani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Peter</namePart>
<namePart type="family">Nabende</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jesujoba</namePart>
<namePart type="family">Alabi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Thapelo</namePart>
<namePart type="family">Sindane</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Happy</namePart>
<namePart type="family">Buzaaba</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shamsuddeen</namePart>
<namePart type="given">Hassan</namePart>
<namePart type="family">Muhammad</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Chris</namePart>
<namePart type="given">Chinenye</namePart>
<namePart type="family">Emezue</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Perez</namePart>
<namePart type="family">Ogayo</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Anuoluwapo</namePart>
<namePart type="family">Aremu</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Catherine</namePart>
<namePart type="family">Gitau</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Derguene</namePart>
<namePart type="family">Mbaye</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jonathan</namePart>
<namePart type="family">Mukiibi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Blessing</namePart>
<namePart type="family">Sibanda</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Bonaventure</namePart>
<namePart type="given">F</namePart>
<namePart type="given">P</namePart>
<namePart type="family">Dossou</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Andiswa</namePart>
<namePart type="family">Bukula</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Rooweither</namePart>
<namePart type="family">Mabuya</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Allahsera</namePart>
<namePart type="given">Auguste</namePart>
<namePart type="family">Tapo</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Edwin</namePart>
<namePart type="family">Munkoh-Buabeng</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Victoire</namePart>
<namePart type="family">Memdjokam Koagne</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Fatoumata</namePart>
<namePart type="family">Ouoba Kabore</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Amelia</namePart>
<namePart type="family">Taylor</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Godson</namePart>
<namePart type="family">Kalipe</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tebogo</namePart>
<namePart type="family">Macucwa</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Vukosi</namePart>
<namePart type="family">Marivate</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tajuddeen</namePart>
<namePart type="family">Gwadabe</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mboning</namePart>
<namePart type="given">Tchiaze</namePart>
<namePart type="family">Elvis</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ikechukwu</namePart>
<namePart type="family">Onyenwe</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Gratien</namePart>
<namePart type="family">Atindogbe</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tolulope</namePart>
<namePart type="family">Adelani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Idris</namePart>
<namePart type="family">Akinade</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Olanrewaju</namePart>
<namePart type="family">Samuel</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Marien</namePart>
<namePart type="family">Nahimana</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Théogène</namePart>
<namePart type="family">Musabeyezu</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Emile</namePart>
<namePart type="family">Niyomutabazi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ester</namePart>
<namePart type="family">Chimhenga</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Kudzai</namePart>
<namePart type="family">Gotosa</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Patrick</namePart>
<namePart type="family">Mizha</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Apelete</namePart>
<namePart type="family">Agbolo</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Seydou</namePart>
<namePart type="family">Traore</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Chinedu</namePart>
<namePart type="family">Uchechukwu</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Aliyu</namePart>
<namePart type="family">Yusuf</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Muhammad</namePart>
<namePart type="family">Abdullahi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Dietrich</namePart>
<namePart type="family">Klakow</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2023-07</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</title>
</titleInfo>
<name type="personal">
<namePart type="given">Anna</namePart>
<namePart type="family">Rogers</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jordan</namePart>
<namePart type="family">Boyd-Graber</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Naoaki</namePart>
<namePart type="family">Okazaki</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Toronto, Canada</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.</abstract>
<identifier type="citekey">dione-etal-2023-masakhapos</identifier>
<identifier type="doi">10.18653/v1/2023.acl-long.609</identifier>
<location>
<url>https://aclanthology.org/2023.acl-long.609/</url>
</location>
<part>
<date>2023-07</date>
<extent unit="page">
<start>10883</start>
<end>10900</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
%A Dione, Cheikh M. Bamba
%A Adelani, David Ifeoluwa
%A Nabende, Peter
%A Alabi, Jesujoba
%A Sindane, Thapelo
%A Buzaaba, Happy
%A Muhammad, Shamsuddeen Hassan
%A Emezue, Chris Chinenye
%A Ogayo, Perez
%A Aremu, Anuoluwapo
%A Gitau, Catherine
%A Mbaye, Derguene
%A Mukiibi, Jonathan
%A Sibanda, Blessing
%A Dossou, Bonaventure F. P.
%A Bukula, Andiswa
%A Mabuya, Rooweither
%A Tapo, Allahsera Auguste
%A Munkoh-Buabeng, Edwin
%A Memdjokam Koagne, Victoire
%A Ouoba Kabore, Fatoumata
%A Taylor, Amelia
%A Kalipe, Godson
%A Macucwa, Tebogo
%A Marivate, Vukosi
%A Gwadabe, Tajuddeen
%A Elvis, Mboning Tchiaze
%A Onyenwe, Ikechukwu
%A Atindogbe, Gratien
%A Adelani, Tolulope
%A Akinade, Idris
%A Samuel, Olanrewaju
%A Nahimana, Marien
%A Musabeyezu, Théogène
%A Niyomutabazi, Emile
%A Chimhenga, Ester
%A Gotosa, Kudzai
%A Mizha, Patrick
%A Agbolo, Apelete
%A Traore, Seydou
%A Uchechukwu, Chinedu
%A Yusuf, Aliyu
%A Abdullahi, Muhammad
%A Klakow, Dietrich
%Y Rogers, Anna
%Y Boyd-Graber, Jordan
%Y Okazaki, Naoaki
%S Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
%D 2023
%8 July
%I Association for Computational Linguistics
%C Toronto, Canada
%F dione-etal-2023-masakhapos
%X In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages.
%R 10.18653/v1/2023.acl-long.609
%U https://aclanthology.org/2023.acl-long.609/
%U https://doi.org/10.18653/v1/2023.acl-long.609
%P 10883-10900
Markdown (Informal)
[MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages](https://aclanthology.org/2023.acl-long.609/) (Dione et al., ACL 2023)
ACL
- Cheikh M. Bamba Dione, David Ifeoluwa Adelani, Peter Nabende, Jesujoba Alabi, Thapelo Sindane, Happy Buzaaba, Shamsuddeen Hassan Muhammad, Chris Chinenye Emezue, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jonathan Mukiibi, Blessing Sibanda, Bonaventure F. P. Dossou, Andiswa Bukula, Rooweither Mabuya, Allahsera Auguste Tapo, Edwin Munkoh-Buabeng, Victoire Memdjokam Koagne, Fatoumata Ouoba Kabore, Amelia Taylor, Godson Kalipe, Tebogo Macucwa, Vukosi Marivate, Tajuddeen Gwadabe, Mboning Tchiaze Elvis, Ikechukwu Onyenwe, Gratien Atindogbe, Tolulope Adelani, Idris Akinade, Olanrewaju Samuel, Marien Nahimana, Théogène Musabeyezu, Emile Niyomutabazi, Ester Chimhenga, Kudzai Gotosa, Patrick Mizha, Apelete Agbolo, Seydou Traore, Chinedu Uchechukwu, Aliyu Yusuf, Muhammad Abdullahi, and Dietrich Klakow. 2023. MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10883–10900, Toronto, Canada. Association for Computational Linguistics.