Non-Parametric Word Sense Disambiguation for Historical Languages

Enrique Manjavacas Arevalo, Lauren Fonteyn


Abstract
Recent approaches to Word Sense Disambiguation (WSD) have profited from the enhanced contextualized word representations coming from contemporary Large Language Models (LLMs). This advancement is accompanied by a renewed interest in WSD applications in Humanities research, where the lack of suitable, specific WSD-annotated resources is a hurdle in developing ad-hoc WSD systems. Because they can exploit sentential context, LLMs are particularly suited for disambiguation tasks. Still, the application of LLMs is often limited to linear classifiers trained on top of the LLM architecture. In this paper, we follow recent developments in non-parametric learning and show how LLMs can be efficiently fine-tuned to achieve strong few-shot performance on WSD for historical languages (English and Dutch, date range: 1450-1950). We test our hypothesis using (i) a large, general evaluation set taken from large lexical databases, and (ii) a small real-world scenario involving an ad-hoc WSD task. Moreover, this paper marks the release of GysBERT, a LLM for historical Dutch.
Anthology ID:
2022.nlp4dh-1.16
Volume:
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
123–134
Language:
URL:
https://aclanthology.org/2022.nlp4dh-1.16
DOI:
10.18653/v1/2022.nlp4dh-1.16
Bibkey:
Cite (ACL):
Enrique Manjavacas Arevalo and Lauren Fonteyn. 2022. Non-Parametric Word Sense Disambiguation for Historical Languages. In Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities, pages 123–134, Taipei, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Non-Parametric Word Sense Disambiguation for Historical Languages (Manjavacas Arevalo & Fonteyn, NLP4DH 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlp4dh-1.16.pdf