HATE-ITA: Hate Speech Detection in Italian Social Media Text

Debora Nozza, Federico Bianchi, Giuseppe Attanasio


Abstract
Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing supplies appropriate algorithms for trying to reach this objective, all research efforts are directed toward the English language. This strongly limits the classification power on non-English languages. In this paper, we test several learning frameworks for identifying hate speech in Italian text. We release HATE-ITA, a multi-language model trained on a large set of English data and available Italian datasets. HATE-ITA performs better than mono-lingual models and seems to adapt well also on language-specific slurs. We hope our findings will encourage the research in other mid-to-low resource communities and provide a valuable benchmarking tool for the Italian community.
Anthology ID:
2022.woah-1.24
Volume:
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Month:
July
Year:
2022
Address:
Seattle, Washington (Hybrid)
Editors:
Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
252–260
Language:
URL:
https://aclanthology.org/2022.woah-1.24
DOI:
10.18653/v1/2022.woah-1.24
Bibkey:
Cite (ACL):
Debora Nozza, Federico Bianchi, and Giuseppe Attanasio. 2022. HATE-ITA: Hate Speech Detection in Italian Social Media Text. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 252–260, Seattle, Washington (Hybrid). Association for Computational Linguistics.
Cite (Informal):
HATE-ITA: Hate Speech Detection in Italian Social Media Text (Nozza et al., WOAH 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.woah-1.24.pdf
Video:
 https://aclanthology.org/2022.woah-1.24.mp4
Code
 milanlproc/hate-ita