More DWUGs: Extending and Evaluating Word Usage Graph Datasets in Multiple Languages

Dominik Schlechtweg, Pierluigi Cassotti, Bill Noble, David Alfter, Sabine Schulte Im Walde, Nina Tahmasebi


Abstract
Word Usage Graphs (WUGs) represent human semantic proximity judgments for pairs of word uses in a weighted graph, which can be clustered to infer word sense clusters from simple pairwise word use judgments, avoiding the need for word sense definitions. SemEval-2020 Task 1 provided the first and to date largest manually annotated, diachronic WUG dataset. In this paper, we check the robustness and correctness of the annotations by continuing the SemEval annotation algorithm for two more rounds and comparing against an established annotation paradigm. Further, we test the reproducibility by resampling a new, smaller set of word uses from the SemEval source corpora and annotating them. Our work contributes to a better understanding of the problems and opportunities of the WUG annotation paradigm and points to future improvements.
Anthology ID:
2024.emnlp-main.796
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14379–14393
Language:
URL:
https://aclanthology.org/2024.emnlp-main.796/
DOI:
10.18653/v1/2024.emnlp-main.796
Bibkey:
Cite (ACL):
Dominik Schlechtweg, Pierluigi Cassotti, Bill Noble, David Alfter, Sabine Schulte Im Walde, and Nina Tahmasebi. 2024. More DWUGs: Extending and Evaluating Word Usage Graph Datasets in Multiple Languages. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14379–14393, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
More DWUGs: Extending and Evaluating Word Usage Graph Datasets in Multiple Languages (Schlechtweg et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.796.pdf