Word sense discovery based on sense descriptor dissimilarity

Reinhard Rapp


Abstract
In machine translation, information on word ambiguities is usually provided by the lexicographers who construct the lexicon. In this paper we propose an automatic method for word sense induction, i.e. for the discovery of a set of sense descriptors to a given ambiguous word. The approach is based on the statistics of the distributional similarity between the words in a corpus. Our algorithm works as follows: The 20 strongest first-order associations to the ambiguous word are considered as sense descriptor candidates. All pairs of these candidates are ranked according to the following two criteria: First, the two words in a pair should be as dissimilar as possible. Second, although being dissimilar their co-occurrence vectors should add up to the co-occurrence vector of the ambiguous word scaled by two. Both conditions together have the effect that preference is given to pairs whose co-occurring words are complementary. For best results, our implementation uses singular value decomposition, entropy-based weights, and second-order similarity metrics.
Anthology ID:
2003.mtsummit-papers.42
Volume:
Proceedings of Machine Translation Summit IX: Papers
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-papers.42
DOI:
Bibkey:
Cite (ACL):
Reinhard Rapp. 2003. Word sense discovery based on sense descriptor dissimilarity. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
Cite (Informal):
Word sense discovery based on sense descriptor dissimilarity (Rapp, MTSummit 2003)
Copy Citation:
PDF:
https://aclanthology.org/2003.mtsummit-papers.42.pdf