SoMaJo: State-of-the-art tokenization for German web and social media texts

Thomas Proisl, Peter Uhrig


Anthology ID:
W16-2607
Volume:
Proceedings of the 10th Web as Corpus Workshop
Month:
August
Year:
2016
Address:
Berlin
Editors:
Paul Cook, Stefan Evert, Roland Schäfer, Egon Stemle
Venue:
WAC
SIG:
SIGWAC
Publisher:
Association for Computational Linguistics
Note:
Pages:
57–62
Language:
URL:
https://aclanthology.org/W16-2607
DOI:
10.18653/v1/W16-2607
Bibkey:
Cite (ACL):
Thomas Proisl and Peter Uhrig. 2016. SoMaJo: State-of-the-art tokenization for German web and social media texts. In Proceedings of the 10th Web as Corpus Workshop, pages 57–62, Berlin. Association for Computational Linguistics.
Cite (Informal):
SoMaJo: State-of-the-art tokenization for German web and social media texts (Proisl & Uhrig, WAC 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-2607.pdf