AraT5-MSAizer: Translating Dialectal Arabic to MSA

Murhaf Fares


Abstract
This paper outlines the process of training the AraT5-MSAizer model, a transformer-based neural machine translation model aimed at translating five regional Arabic dialects into Modern Standard Arabic (MSA). Developed for Task 2 of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools, the model attained a BLEU score of 21.79% on the test set associated with this task.
Anthology ID:
2024.osact-1.16
Volume:
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Hend Al-Khalifa, Kareem Darwish, Hamdy Mubarak, Mona Ali, Tamer Elsayed
Venues:
OSACT | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
124–129
Language:
URL:
https://aclanthology.org/2024.osact-1.16
DOI:
Bibkey:
Cite (ACL):
Murhaf Fares. 2024. AraT5-MSAizer: Translating Dialectal Arabic to MSA. In Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024, pages 124–129, Torino, Italia. ELRA and ICCL.
Cite (Informal):
AraT5-MSAizer: Translating Dialectal Arabic to MSA (Fares, OSACT-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.osact-1.16.pdf