Ulrich Schäfer

Also published as: Ulrich Schaefer, Ulrich Schafer


2024

pdf bib
Linguistic Obfuscation Attacks and Large Language Model Uncertainty
Sebastian Steindl | Ulrich Schäfer | Bernd Ludwig | Patrick Levi
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)

Large Language Models (LLMs) have taken the research field of Natural Language Processing by storm. Researchers are not only investigating their capabilities and possible applications, but also their weaknesses and how they may be exploited.This has resulted in various attacks and “jailbreaking” approaches that have gained large interest within the community.The vulnerability of LLMs to certain types of input may pose major risks regarding the real-world usage of LLMs in productive operations.We therefore investigate the relationship between a LLM’s uncertainty and its vulnerability to jailbreaking attacks.To this end, we focus on a probabilistic point of view of uncertainty and employ a state-of-the art open-source LLM.We investigate an attack that is based on linguistic obfuscation.Our results indicate that the model is subject to a higher level of uncertainty when confronted with manipulated prompts that aim to evade security mechanisms.This study lays the foundation for future research into the link between model uncertainty and its vulnerability to jailbreaks.

pdf bib
Counterfactual Dialog Mixing as Data Augmentation for Task-Oriented Dialog Systems
Sebastian Steindl | Ulrich Schäfer | Bernd Ludwig
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

High-quality training data for Task-Oriented Dialog (TOD) systems is costly to come by if no corpora are available. One method to extend available data is data augmentation. Yet, the research into and adaptation of data augmentation techniques for TOD systems is limited in comparison with other data modalities. We propose a novel, causally-flavored data augmentation technique called Counterfactual Dialog Mixing (CDM) that generates realistic synthetic dialogs via counterfactuals to increase the amount of training data. We demonstrate the method on a benchmark dataset and show that a model trained to classify the counterfactuals from the original data fails to do so, which strengthens the claim of creating realistic synthetic dialogs. To evaluate the effectiveness of CDM, we train a current architecture on a benchmark dataset and compare the performance with and without CDM. By doing so, we achieve state-of-the-art on some metrics. We further investigate the external generalizability and a lower resource setting. To evaluate the models, we adopted an interactive evaluation scheme.

2023

pdf bib
Controlled Data Augmentation for Training Task-Oriented Dialog Systems with Low Resource Data
Sebastian Steindl | Ulrich Schäfer | Bernd Ludwig
Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning

Modern dialog systems rely on Deep Learning to train transformer-based model architectures. These notoriously rely on large amounts of training data. However, the collection of conversational data is often a tedious and costly process. This is especially true for Task-Oriented Dialogs, where the system ought to help the user achieve specific tasks, such as making reservations. We investigate a controlled strategy for dialog synthesis. Our method generates utterances based on dialog annotations in a sequence-to-sequence manner. Besides exploring the viability of the approach itself, we also explore the effect of constrained beam search on the generation capabilities. Moreover, we analyze the effectiveness of the proposed method as a data augmentation by studying the impact the synthetic dialogs have on training dialog systems. We perform the experiments in multiple settings, simulating various amounts of ground-truth data. Our work shows that a controlled generation approach is a viable method to synthesize Task-Oriented Dialogs, that can in turn be used to train dialog systems. We were able to improve this process by utilizing constrained beam search.

2012

pdf bib
Extracting glossary sentences from scholarly articles: A comparative evaluation of pattern bootstrapping and deep analysis
Melanie Reiplinger | Ulrich Schäfer | Magdalena Wolska
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf bib
Towards an ACL Anthology Corpus with Logical Document Structure. An Overview of the ACL 2012 Contributed Task
Ulrich Schäfer | Jonathon Read | Stephan Oepen
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf bib
Combining OCR Outputs for Logical Document Structure Markup. Technical Background to the ACL 2012 Contributed Task
Ulrich Schäfer | Benjamin Weitz
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf bib
A Fully Coreference-annotated Corpus of Scholarly Papers from the ACL Anthology
Ulrich Schäfer | Christian Spurk | Jörg Steffen
Proceedings of COLING 2012: Posters

pdf bib
A Graphical Citation Browser for the ACL Anthology
Benjamin Weitz | Ulrich Schäfer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Navigation in large scholarly paper collections is tedious and not well supported in most scientific digital libraries. We describe a novel browser-based graphical tool implemented using HTML5 Canvas. It displays citation information extracted from the paper text to support useful navigation. The tool is implemented using a client/server architecture. A citation graph of the digital library is built in the memory of the server. On the client side, egdes of the displayed citation (sub)graph surrounding a document are labeled with keywords signifying the kind of citation made from one document to another. These keywords were extracted using NLP tools such as tokenizer, sentence boundary detection and part-of-speech tagging applied to the text extracted from the original PDF papers (currently 22,500). By clicking on an egde, the user can inspect the corresponding citation sentence in context, in most cases even also highlighted in the original PDF layout. The system is publicly accessible as part of the ACL Anthology Searchbench.

2011

pdf bib
Ensemble-style Self-training on Citation Classification
Cailing Dong | Ulrich Schäfer
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
The ACL Anthology Searchbench
Ulrich Schäfer | Bernd Kiefer | Christian Spurk | Jörg Steffen | Rui Wang
Proceedings of the ACL-HLT 2011 System Demonstrations

2010

pdf bib
Scientific Authoring Support: A Tool to Navigate in Typed Citation Graphs
Ulrich Schäfer | Uwe Kasterka
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids

pdf bib
Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components
Arif Bramantoro | Ulrich Schäfer | Toru Ishida
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Web services are increasingly being used in the natural language processing community as a way to increase the interoperability amongst language resources. This paper extends our previous work on integrating two different platforms, i.e. Heart of Gold and Language Grid. The Language Grid is an infrastructure built on top of the Internet to provide distributed language services. Heart of Gold is known as middleware architecture for integrating deep and shallow natural language processing components. The new feature of the integrated architecture is the combination of composite language services in the Language Grid and the multiple linguistic processing components in Heart of Gold to provide a better quality of language resources available on the Web. Thus, language resources with different characteristics can be combined based on the concept of service oriented computing with different treatment for each combination. Having Heart of Gold fully integrated in the Language Grid environment would contribute to the heterogeneity of language services.

pdf bib
DL Meet FL: A Bidirectional Mapping between Ontologies and Linguistic Knowledge
Hans-Ulrich Krieger | Ulrich Schäfer
Coling 2010: Posters

2008

pdf bib
Extracting and Querying Relations in Scientific Papers on Language Technology
Ulrich Schäfer | Hans Uszkoreit | Christian Federmann | Torsten Marek | Yajing Zhang
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe methods for extracting interesting factual relations from scientific texts in computational linguistics and language technology taken from the ACL Anthology. We use a hybrid NLP architecture with shallow preprocessing for increased robustness and domain-specific, ontology-based named entity recognition, followed by a deep HPSG parser running the English Resource Grammar (ERG). The extracted relations in the MRS (minimal recursion semantics) format are simplified and generalized using WordNet. The resulting “quriples” are stored in a database from where they can be retrieved (again using abstraction methods) by relation-based search. The query interface is embedded in a web browser-based application we call the Scientist’s Workbench. It supports researchers in editing and online-searching scientific papers.

2006

pdf bib
Middleware for Creating and Combining Multi-dimensional NLP Markup
Ulrich Schäfer
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing

pdf bib
Automatic Testing and Evaluation of Multilingual Language Technology Resources and Components
Ulrich Schäfer | Daniel Beck
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe SProUTomat, a tool for daily building, testing and evaluating a complex general-purpose multilingual natural language text processor including its linguistic resources (lingware). Software and lingware are developed, maintained and extended in a distributed manner by multiple authors and projects, i.e., the source code stored in a version control system is modified frequently. The modular design of different, dedicated lingware modules like tokenizers, morphology, gazetteers, type hierarchy, rule formalism on the one hand increases flexibility and re-usability, but on the other hand may lead to fragility with respect to changes. Therefore, frequent testing as known from software engineering is necessary also for lingware to warrant a high level of quality and overall stability of the system. We describe the build, testing and evaluation methods for LT software and lingware we have developed on the basis of the open source, platform-independent Apache Ant tool and the configurable evaluation tool JTaCo.

pdf bib
OntoNERdIE – Mapping and Linking Ontologies to Named Entity Recognition and Information Extraction Resources
Ulrich Schäfer
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Semantic Web and NLP We describe an implemented offline procedure that maps OWL/RDF-encoded ontologies with large, dynamically maintained instance data to named entity recognition (NER) and information extraction (IE) engine resources, preserving hierarchical concept information and links back to the ontology concepts and instances. The main motivations are (i) improving NER/IE precision and recall in closed domains, (ii) exploiting linguistic knowledge (context, inflection, anaphora) for identifying ontology instances in texts more robustly, (iii) giving full access to ontology instances and concepts in natural language processing results, e.g. for subsequent ontology queries, navigation or inference, (iv) avoiding duplication of work in development and maintenance of similar resources in independent places, namely lingware and ontologies. We show an application in hybrid deep-shallow natural language processing that is e.g. used for question analysis in closed domains. Further applications could be automatic hyperlinking or other innovative semantic-web related applications.

pdf bib
Preprocessing and Tokenisation Standards in DELPH-IN Tools
Benjamin Waldron | Ann Copestake | Ulrich Schäfer | Bernd Kiefer
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We discuss preprocessing and tokenisation standards within DELPH-IN, a large scale open-source collaboration providing multiple independent multilingual shallow and deep processors. We discuss (i) a component-specific XML interface format which has been used for some time to interface preprocessor results to the PET parser, and (ii) our implementation of a more generic XML interface format influenced heavily by the (ISO working draft) Morphosyntactic Annotation Framework (MAF). Our generic format encapsulates the information which may be passed from the preprocessing stage to a parser: it uses standoff-annotation, a lattice for the representation of structural ambiguity, intra-annotation dependencies and allows for highly structured annotation content. This work builds on the existing Heart of Gold middleware system, and previous work on Robust Minimal Recursion Semantics (RMRS) as part of an inter-component interface. We give examples of usage with a number of the DELPH-IN processing components and deep grammars.

2004

pdf bib
The DeepThought Core Architecture Framework
Ulrich Callmeier | Andreas Eisele | Ulrich Schäfer | Melanie Siegel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
WHAT: An XSLT-based Infrastructure for the Integration of Natural Language Processing Components
Ulrich Schäfer
Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS)

pdf bib
Integrated Shallow and Deep Parsing: TopP Meets HPSG
Anette Frank | Markus Becker | Berthold Crysmann | Bernd Kiefer | Ulrich Schäfer
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Integrating Information Extraction and Automatic Hyperlinking
Stephan Busemann | Witold Drozdzynski | Hans-Ulrich Krieger | Jakub Piskorski | Ulrich Schaefer | Hans Uszkoreit | Feiyu Xu
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
An Integrated Archictecture for Shallow and Deep Processing
Berthold Crysmann | Anette Frank | Bernd Kiefer | Stefan Mueller | Guenter Neumann | Jakub Piskorski | Ulrich Schaefer | Melanie Siegel | Hans Uszkoreit | Feiyu Xu | Markus Becker | Hans-Ulrich Krieger
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

1994

pdf bib
TDL-A Type Description Language for Constraint-Based Grammars
Hans-Ulrich Krieger | Ulrich Schafer
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics