Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect

Saméh Kchaou, Rahma Boujelbane, Emna Fsih, Lamia Hadrich-Belguith


Abstract
With the growing access to the internet, the spoken Arabic dialect language becomes informal languages written in social media. Most users post comments using their own dialect. This linguistic situation inhibits mutual understanding between internet users and makes difficult to use computational approaches since most Arabic resources are intended for the formal language: Modern Standard Arabic (MSA). In this paper, we present a pipeline to standardize the written texts in social networks by translating them to the standard language MSA. We fine-tun at first an identification bert-based model to select Tunisian Dialect (TD) from MSA and other dialects. Then, we learned transformer model to translate TD to MSA. The final system includes the translated TD text and the originally text written in MSA. Each of these steps was evaluated on the same test corpus. In order to test the effectiveness of the approach, we compared two opinion analysis models, the first intended for the Sentiment Analysis (SA) of dialect texts and the second for the MSA texts. We concluded that through standardization we obtain the best score.
Anthology ID:
2022.lrec-1.582
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5436–5443
Language:
URL:
https://aclanthology.org/2022.lrec-1.582
DOI:
Bibkey:
Cite (ACL):
Saméh Kchaou, Rahma Boujelbane, Emna Fsih, and Lamia Hadrich-Belguith. 2022. Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5436–5443, Marseille, France. European Language Resources Association.
Cite (Informal):
Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect (Kchaou et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.582.pdf