Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Badr AlKhamissi, Mohamed Gabr, Muhammad ElNokrashy, Khaled Essam


Abstract
In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of MARBERT that achieves an F1-score of 34.03% for DA at the country-level development set—an improvement of 7.63% from previous work.
Anthology ID:
2021.wanlp-1.29
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Editors:
Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
260–264
Language:
URL:
https://aclanthology.org/2021.wanlp-1.29
DOI:
Bibkey:
Cite (ACL):
Badr AlKhamissi, Mohamed Gabr, Muhammad ElNokrashy, and Khaled Essam. 2021. Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 260–264, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task (AlKhamissi et al., WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.29.pdf
Code
 mohamedgabr96/NeuralDialectDetector