A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts

Kazi Elahi, Tasnuva Rahman, Shakil Shahriar, Samir Sarker, Md. Shawon, G. M. Shahariar


Abstract
While Bangla is considered a language with limited resources, sentiment analysis has been a subject of extensive research in the literature. Nevertheless, there is a scarcity of exploration into sentiment analysis specifically in the realm of noisy Bangla texts. In this paper, we introduce a dataset (NC-SentNoB) that we annotated manually to identify ten different types of noise found in a pre-existing sentiment analysis dataset comprising of around 15K noisy Bangla texts. At first, given an input noisy text, we identify the noise type, addressing this as a multi-label classification task. Then, we introduce baseline noise reduction methods to alleviate noise prior to conducting sentiment analysis. Finally, we assess the performance of fine-tuned sentiment analysis models with both noisy and noise-reduced texts to make comparisons. The experimental findings indicate that the noise reduction methods utilized are not satisfactory, highlighting the need for more suitable noise reduction methods in future research endeavors. We have made the implementation and dataset presented in this paper publicly available at https://github.com/ktoufiquee/A-Comparative-Analysis-of-Noise-Reduction-Methods-in-Sentiment-Analysis-on-Noisy-Bangla-Texts
Anthology ID:
2024.wnut-1.5
Volume:
Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024)
Month:
March
Year:
2024
Address:
San Ġiljan, Malta
Editors:
Rob van der Goot, JinYeong Bak, Max Müller-Eberstein, Wei Xu, Alan Ritter, Tim Baldwin
Venues:
WNUT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–57
Language:
URL:
https://aclanthology.org/2024.wnut-1.5
DOI:
Bibkey:
Cite (ACL):
Kazi Elahi, Tasnuva Rahman, Shakil Shahriar, Samir Sarker, Md. Shawon, and G. M. Shahariar. 2024. A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts. In Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024), pages 44–57, San Ġiljan, Malta. Association for Computational Linguistics.
Cite (Informal):
A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bangla Texts (Elahi et al., WNUT-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wnut-1.5.pdf
Video:
 https://aclanthology.org/2024.wnut-1.5.mp4