MucLex: A German Lexicon for Surface Realisation

Kira Klimt, Daniel Braun, Daniela Schneider, Florian Matthes


Abstract
Language resources for languages other than English are often scarce. Rule-based surface realisers need elaborate lexica in order to be able to generate correct language, especially in languages like German, which include many irregular word forms. In this paper, we present MucLex, a German lexicon for the Natural Language Generation task of surface realisation, based on the crowd-sourced online lexicon Wiktionary. MucLex contains more than 100,000 lemmata and more than 670,000 different word forms in a well-structured XML file and is available under the Creative Commons BY-SA 3.0 license.
Anthology ID:
2020.lrec-1.572
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4653–4657
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.572
DOI:
Bibkey:
Cite (ACL):
Kira Klimt, Daniel Braun, Daniela Schneider, and Florian Matthes. 2020. MucLex: A German Lexicon for Surface Realisation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4653–4657, Marseille, France. European Language Resources Association.
Cite (Informal):
MucLex: A German Lexicon for Surface Realisation (Klimt et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.572.pdf
Code
 sebischair/MucLex