Parsing Early New High German: Benefits and limitations of cross-dialectal training

Christopher Sapp, Daniel Dakota, Elliott Evans


Abstract
Historical treebanking within the generative framework has gained in popularity. However, there are still many languages and historical periods yet to be represented. For German, a constituency treebank exists for historical Low German, but not Early New High German. We begin to fill this gap by presenting our initial work on the Parsed Corpus of Early New High German (PCENHG). We present the methodological considerations and workflow for the treebank’s annotations and development. Given the limited amount of currently available PCENHG treebank data, we treat it as a low-resource language and leverage a larger, closely related variety—Middle Low German—to build a parser to help facilitate faster post-annotation correction. We present an analysis on annotation speeds and conclude with a small pilot use-case, highlighting potential for future linguistic analyses. In doing so we highlight the value of the treebank’s development for historical linguistic analysis and demonstrate the benefits and challenges of developing a parser using two closely related historical Germanic varieties.
Anthology ID:
2023.tlt-1.6
Volume:
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)
Month:
March
Year:
2023
Address:
Washington, D.C.
Editors:
Daniel Dakota, Kilian Evang, Sandra Kübler, Lori Levin
Venues:
TLT | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–66
Language:
URL:
https://aclanthology.org/2023.tlt-1.6
DOI:
Bibkey:
Cite (ACL):
Christopher Sapp, Daniel Dakota, and Elliott Evans. 2023. Parsing Early New High German: Benefits and limitations of cross-dialectal training. In Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023), pages 54–66, Washington, D.C.. Association for Computational Linguistics.
Cite (Informal):
Parsing Early New High German: Benefits and limitations of cross-dialectal training (Sapp et al., TLT-SyntaxFest 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.tlt-1.6.pdf