Exploring the Potential of Large Language Models (LLMs) for Low-resource Languages: A Study on Named-Entity Recognition (NER) and Part-Of-Speech (POS) Tagging for Nepali Language

Bipesh Subedi, Sunil Regmi, Bal Krishna Bal, Praveen Acharya


Abstract
Large Language Models (LLMs) have made significant advancements in Natural Language Processing (NLP) by excelling in various NLP tasks. This study specifically focuses on evaluating the performance of LLMs for Named Entity Recognition (NER) and Part-of-Speech (POS) tagging for a low-resource language, Nepali. The aim is to study the effectiveness of these models for languages with limited resources by conducting experiments involving various parameters and fine-tuning and evaluating two datasets namely, ILPRL and EBIQUITY. In this work, we have experimented with eight LLMs for Nepali NER and POS tagging. While some prior works utilized larger datasets than ours, our contribution lies in presenting a comprehensive analysis of multiple LLMs in a unified setting. The findings indicate that NepBERTa, trained solely in the Nepali language, demonstrated the highest performance with F1-scores of 0.76 and 0.90 in ILPRL dataset. Similarly, it achieved 0.79 and 0.97 in EBIQUITY dataset for NER and POS respectively. This study not only highlights the potential of LLMs in performing classification tasks for low-resource languages but also compares their performance with that of alternative approaches deployed for the tasks.
Anthology ID:
2024.lrec-main.611
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
6974–6979
Language:
URL:
https://aclanthology.org/2024.lrec-main.611
DOI:
Bibkey:
Cite (ACL):
Bipesh Subedi, Sunil Regmi, Bal Krishna Bal, and Praveen Acharya. 2024. Exploring the Potential of Large Language Models (LLMs) for Low-resource Languages: A Study on Named-Entity Recognition (NER) and Part-Of-Speech (POS) Tagging for Nepali Language. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6974–6979, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Exploring the Potential of Large Language Models (LLMs) for Low-resource Languages: A Study on Named-Entity Recognition (NER) and Part-Of-Speech (POS) Tagging for Nepali Language (Subedi et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.611.pdf