2025
pdf
bib
abs
LinguAIsts@DravidianLangTech 2025: Abusive Tamil and Malayalam Text targeting Women on Social Media
Dhanyashree G
|
Kalpana K
|
Lekhashree A
|
Arivuchudar K
|
Arthi R
|
Bommineni Sahitya
|
Pavithra J
|
Sandra Johnson
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Social media sites are becoming crucial sites for communication and interaction, yet they are increasingly being utilized to commit gender-based abuse, with horrific, harassing, and degrading comments targeted at women. This paper tries to solve the common issue of women being subjected to abusive language in two South Indian languages, Malayalam and Tamil. To find explicit abuse, implicit bias, preconceptions, and coded language, we were given a set of YouTube comments labeled Abusive and Non-Abusive. To solve this problem, we applied and compared different machine learning models, i.e., Support Vector Machines (SVM), Logistic Regression (LR), and Naive Bayes classifiers, to classify comments into the given categories. The models were trained and validated using the given dataset to achieve the best performance with respect to accuracy and macro F1 score. The solutions proposed aim to make robust content moderation systems that can detect and prevent abusive language, ensuring safer online environments for women.
pdf
bib
abs
LinguAIsts@DravidianLangTech 2025: Misogyny Meme Detection using multimodel Approach
Arthi R
|
Pavithra J
|
Dr G Manikandan
|
Lekhashree A
|
Dhanyashree G
|
Bommineni Sahitya
|
Arivuchudar K
|
Kalpana K
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Memes often disseminate misogynistic material, which nurtures gender discrimination and stereotyping. While it is an effective tool of communication, social media has also provided a fertile ground for online abuse. This vital issue in the multilingual and multimodal setting is tackled by the Misogyny Meme Detection Shared Task. Our method employs advanced NLP techniques and machine learning models to classify memes in Malayalam and Tamil, two low-resource languages. Preprocessing of text includes tokenization, lemmatization, and stop word removal. Features are then extracted using TF-IDF. With the best achievable hyperparameters, along with the SVM model, our system provided very promising outcomes and ranked 9th among the systems competing in the Tamil task with a 0.71259 F1-score, and ranked 15th with an F1-score of 0.68186 in the Malayalam taks. With this research work, it would be established how important AI-based solutions are toward stopping online harassment and developing secure online spaces.
pdf
bib
abs
LinguAIsts@DravidianLangTech 2025: Abusive Tamil and Malayalam Text targeting Women on Social Media
Dhanyashree G
|
Kalpana K
|
Lekhashree A
|
Arivuchudar K
|
Arthi R
|
Bommineni Sahitya
|
Pavithra J
|
Sandra Johnson
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Social media sites are becoming crucial sites for communication and interaction, yet they are increasingly being utilized to commit gender-based abuse, with horrific, harassing, and degrading comments targeted at women. This paper tries to solve the common issue of women being subjected to abusive language in two South Indian languages, Malayalam and Tamil. To find explicit abuse, implicit bias, preconceptions, and coded language, we were given a set of YouTube comments labeled Abusive and Non-Abusive. To solve this problem, we applied and compared different machine learning models, i.e., Support Vector Machines (SVM), Logistic Regression (LR), and Naive Bayes classifiers, to classify comments into the given categories. The models were trained and validated using the given dataset to achieve the best performance with respect to accuracy and macro F1 score. The solutions proposed aim to make robust content moderation systems that can detect and prevent abusive language, ensuring safer online environments for women.
2024
pdf
bib
abs
Challenges and Insights in Identifying Hate Speech and Fake News on Social Media
Shanthi Murugan
|
Arthi R
|
Boomika E
|
Jeyanth S
|
Kaviyarasu S
Proceedings of the 21st International Conference on Natural Language Processing (ICON): Shared Task on Decoding Fake Narratives in Spreading Hateful Stories (Faux-Hate)
Social media has transformed communication, but it has also brought abouta number of serious problems, most notablythe proliferation of hate speech and falseinformation. hate-related conversations arefrequently fueled by misleading narratives.We address this issue by building a multiclassclassification model trained on Faux HateMulti-Label Dataset (Biradar et al. 2024)which consists of hateful remarks that arefraudulent and have a code mix of Hindi andEnglish. Model has been built to classifySeverity (Low, Medium, High) and Target(Individual, Organization, Religion) on thedataset. Performance of the model isevaluated on test dataset achieved varyingscored for each. For Severity model achieves74%, for Target model achieves 74%. Thelimitations and performance issues of themodel has been understood and wellexplained.