Polysemy through the lens of psycholinguistic variables: a dataset and an evaluation of static and contextualized language models

Andrea Bruera, Farbod Zamani, Massimo Poesio


Abstract
Polysemes are words that can have different senses depending on the context of utterance: for instance, ‘newspaper’ can refer to an organization (as in ‘manage the newspaper’) or to an object (as in ‘open the newspaper’). Contrary to a large body of evidence coming from psycholinguistics, polysemy has been traditionally modelled in NLP by assuming that each sense should be given a separate representation in a lexicon (e.g. WordNet). This led to the current situation, where datasets used to evaluate the ability of computational models of semantics miss crucial details about the representation of polysemes, thus limiting the amount of evidence that can be gained from their use. In this paper we propose a framework to approach polysemy as a continuous variation in psycholinguistic properties of a word in context. This approach accommodates different sense interpretations, without postulating clear-cut jumps between senses. First we describe a publicly available English dataset that we collected, where polysemes in context (verb-noun phrases) are annotated for their concreteness and body sensory strength. Then, we evaluate static and contextualized language models in their ability to predict the ratings of each polyseme in context, as well as in their ability to capture the distinction among senses, revealing and characterizing in an interpretable way the models’ flaws.
Anthology ID:
2024.starsem-1.3
Volume:
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Danushka Bollegala, Vered Shwartz
Venue:
*SEM
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–48
Language:
URL:
https://aclanthology.org/2024.starsem-1.3
DOI:
10.18653/v1/2024.starsem-1.3
Bibkey:
Cite (ACL):
Andrea Bruera, Farbod Zamani, and Massimo Poesio. 2024. Polysemy through the lens of psycholinguistic variables: a dataset and an evaluation of static and contextualized language models. In Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), pages 35–48, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Polysemy through the lens of psycholinguistic variables: a dataset and an evaluation of static and contextualized language models (Bruera et al., *SEM 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.starsem-1.3.pdf