A Multi-domain Corpus of Swedish Word Sense Annotation

Richard Johansson, Yvonne Adesam, Gerlof Bouma, Karin Hedberg


Abstract
We describe the word sense annotation layer in Eukalyptus, a freely available five-domain corpus of contemporary Swedish with several annotation layers. The annotation uses the SALDO lexicon to define the sense inventory, and allows word sense annotation of compound segments and multiword units. We give an overview of the new annotation tool developed for this project, and finally present an analysis of the inter-annotator agreement between two annotators.
Anthology ID:
L16-1482
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3019–3022
Language:
URL:
https://aclanthology.org/L16-1482
DOI:
Bibkey:
Cite (ACL):
Richard Johansson, Yvonne Adesam, Gerlof Bouma, and Karin Hedberg. 2016. A Multi-domain Corpus of Swedish Word Sense Annotation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3019–3022, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Multi-domain Corpus of Swedish Word Sense Annotation (Johansson et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1482.pdf