2020
pdf
bib
abs
RKorAPClient: An R Package for Accessing the German Reference Corpus DeReKo via KorAP
Marc Kupietz
|
Nils Diewald
|
Eliza Margaretha
Proceedings of the Twelfth Language Resources and Evaluation Conference
Making corpora accessible and usable for linguistic research is a huge challenge in view of (too) big data, legal issues and a rapidly evolving methodology. This does not only affect the design of user-friendly graphical interfaces to corpus analysis tools, but also the availability of programming interfaces supporting access to the functionality of these tools from various analysis and development environments. RKorAPClient is a new research tool in the form of an R package that interacts with the Web API of the corpus analysis platform KorAP, which provides access to large annotated corpora, including the German reference corpus DeReKo with 45 billion tokens. In addition to optionally authenticated KorAP API access, RKorAPClient provides further processing and visualization features to simplify common corpus analysis tasks. This paper introduces the basic functionality of RKorAPClient and exemplifies various analysis tasks based on DeReKo, that are bundled within the R package and can serve as a basic framework for advanced analysis and visualization approaches.
2016
pdf
bib
abs
KorAP Architecture ― Diving in the Deep Sea of Corpus Data
Nils Diewald
|
Michael Hanl
|
Eliza Margaretha
|
Joachim Bingel
|
Marc Kupietz
|
Piotr Bański
|
Andreas Witt
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP’s design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DeReKo for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.
2011
pdf
bib
An Approach to the Automated Evaluation of Pipeline Architectures in Natural Language Dialogue Systems
Eliza Margaretha
|
David DeVault
Proceedings of the SIGDIAL 2011 Conference
2008
pdf
bib
Comparing the Value of Latent Semantic Analysis on two English-to-Indonesian lexical mapping tasks
Eliza Margaretha
|
Ruli Manurung
Proceedings of the Australasian Language Technology Association Workshop 2008