Definition Extraction Feature Analysis: From Canonical to Naturally-Occurring Definitions

Mireia Roig Mirapeix, Luis Espinosa Anke, Jose Camacho-Collados


Abstract
Textual definitions constitute a fundamental source of knowledge when seeking the meaning of words, and they are the cornerstone of lexical resources like glossaries, dictionaries, encyclopedia or thesauri. In this paper, we present an in-depth analytical study on the main features relevant to the task of definition extraction. Our main goal is to study whether linguistic structures from canonical (the Aristotelian or genus et differentia model) can be leveraged to retrieve definitions from corpora in different domains of knowledge and textual genres alike. To this end, we develop a simple linear classifier and analyze the contribution of several (sets of) linguistic features. Finally, as a result of our experiments, we also shed light on the particularities of existing benchmarks as well as the most challenging aspects of the task.
Anthology ID:
2020.cogalex-1.10
Volume:
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Month:
December
Year:
2020
Address:
Online
Editors:
Michael Zock, Emmanuele Chersoni, Alessandro Lenci, Enrico Santus
Venue:
CogALex
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
81–91
Language:
URL:
https://aclanthology.org/2020.cogalex-1.10
DOI:
Bibkey:
Cite (ACL):
Mireia Roig Mirapeix, Luis Espinosa Anke, and Jose Camacho-Collados. 2020. Definition Extraction Feature Analysis: From Canonical to Naturally-Occurring Definitions. In Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pages 81–91, Online. Association for Computational Linguistics.
Cite (Informal):
Definition Extraction Feature Analysis: From Canonical to Naturally-Occurring Definitions (Roig Mirapeix et al., CogALex 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.cogalex-1.10.pdf
Data
DEFT Corpus