The Effect of Unobserved Word-Context Co-occurrences on a VectorMixture Approach for Compositional Distributional Semantics

Amir Bakarov


Abstract
Swivel (Submatrix-WIse Vector Embedding Learner) is a distributional semantic model based on counting point-wise mutual information values, capable of capturing word-context co-occurrences in the PMI matrix that were not noted in the training corpus. This model outperforms mainstream word embedding training algorithms such as Continuous Bag-of-Words, GloVe and Skip-Gram in word similarity and word analogy tasks. But the properness of these intrinsic tasks could be questioned, and it is unclear if the ability to count unobservable word-context co-occurrences could also be helpful for downstream tasks. In this work we propose a comparison of Word2Vec and Swivel for two downstream tasks based on natural language sentence matching: the paraphrase detection task and the textual entailment task. As a result, we reveal that Swivel outperforms Word2Vec in both cases, but the difference is minuscule. We can conclude, that the ability to learn embeddings for rarely co-occurring words is not so crucial for downstream tasks.
Anthology ID:
2018.clib-1.19
Volume:
Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018)
Month:
May
Year:
2018
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
153–161
Language:
URL:
https://aclanthology.org/2018.clib-1.19/
DOI:
Bibkey:
Cite (ACL):
Amir Bakarov. 2018. The Effect of Unobserved Word-Context Co-occurrences on a VectorMixture Approach for Compositional Distributional Semantics. In Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018), pages 153–161, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
The Effect of Unobserved Word-Context Co-occurrences on a VectorMixture Approach for Compositional Distributional Semantics (Bakarov, CLIB 2018)
Copy Citation:
PDF:
https://aclanthology.org/2018.clib-1.19.pdf