Comparison of Genres in Word Sense Disambiguation using Automatically Generated Text Collections

Angelina Bolshina, Natalia Loukachevitch


Abstract
The best approaches in Word Sense Disambiguation (WSD) are supervised and rely on large amounts of hand-labelled data, which is not always available and costly to create. In our work we describe an approach that is used to create an automatically labelled collection based on the monosemous relatives (related unambiguous entries) for Russian. The main contribution of our work is that we extracted monosemous relatives that can be located at relatively long distances from a target ambiguous word and ranked them according to the similarity measure to the target sense. We evaluated word sense disambiguation models based on a nearest neighbour classification on BERT and ELMo embeddings and two text collections. Our work relies on the Russian wordnet RuWordNet.
Anthology ID:
2020.clib-1.17
Volume:
Proceedings of the 4th International Conference on Computational Linguistics in Bulgaria (CLIB 2020)
Month:
September
Year:
2020
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, IBL -- BAS
Note:
Pages:
155–164
Language:
URL:
https://aclanthology.org/2020.clib-1.17
DOI:
Bibkey:
Cite (ACL):
Angelina Bolshina and Natalia Loukachevitch. 2020. Comparison of Genres in Word Sense Disambiguation using Automatically Generated Text Collections. In Proceedings of the 4th International Conference on Computational Linguistics in Bulgaria (CLIB 2020), pages 155–164, Sofia, Bulgaria. Department of Computational Linguistics, IBL -- BAS.
Cite (Informal):
Comparison of Genres in Word Sense Disambiguation using Automatically Generated Text Collections (Bolshina & Loukachevitch, CLIB 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.clib-1.17.pdf