2020
pdf
bib
abs
Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech Recognition in Real-World Applications - A Case Study on German Oral History Interviews
Michael Gref
|
Oliver Walter
|
Christoph Schmidt
|
Sven Behnke
|
Joachim Köhler
Proceedings of the Twelfth Language Resources and Evaluation Conference
While recent automatic speech recognition systems achieve remarkable performance when large amounts of adequate, high quality annotated speech data is used for training, the same systems often only achieve an unsatisfactory result for tasks in domains that greatly deviate from the conditions represented by the training data. For many real-world applications, there is a lack of sufficient data that can be directly used for training robust speech recognition systems. To address this issue, we propose and investigate an approach that performs a robust acoustic model adaption to a target domain in a cross-lingual, multi-staged manner. Our approach enables the exploitation of large-scale training data from other domains in both the same and other languages. We evaluate our approach using the challenging task of German oral history interviews, where we achieve a relative reduction of the word error rate by more than 30% compared to a model trained from scratch only on the target domain, and 6-7% relative compared to a model trained robustly on 1000 hours of same-language out-of-domain training data.
2018
pdf
bib
Data-Driven Pronunciation Modeling of Swiss German Dialectal Speech for Automatic Speech Recognition
Michael Stadtschnitzer
|
Christoph Schmidt
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2014
pdf
bib
abs
Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather
Jens Forster
|
Christoph Schmidt
|
Oscar Koller
|
Martin Bellgardt
|
Hermann Ney
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper introduces the RWTH-PHOENIX-Weather 2014, a video-based, large vocabulary, German sign language corpus which has been extended over the last two years, tripling the size of the original corpus. The corpus contains weather forecasts simultaneously interpreted into sign language which were recorded from German public TV and manually annotated using glosses on the sentence level and semi-automatically transcribed spoken German extracted from the videos using the open-source speech recognition system RASR. Spatial annotations of the signers’ hands as well as shape and orientation annotations of the dominant hand have been added for more than 40k respectively 10k video frames creating one of the largest corpora allowing for quantitative evaluation of object tracking algorithms. Further, over 2k signs have been annotated using the SignWriting annotation system, focusing on the shape, orientation, movement as well as spatial contacts of both hands. Finally, extended recognition and translation setups are defined, and baseline results are presented.
2013
pdf
bib
The RWTH Aachen Machine Translation System for WMT 2013
Stephan Peitz
|
Saab Mansour
|
Jan-Thorsten Peter
|
Christoph Schmidt
|
Joern Wuebker
|
Matthias Huck
|
Markus Freitag
|
Hermann Ney
Proceedings of the Eighth Workshop on Statistical Machine Translation
pdf
bib
abs
Using viseme recognition to improve a sign language translation system
Christoph Schmidt
|
Oscar Koller
|
Hermann Ney
|
Thomas Hoyoux
|
Justus Piater
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers
Sign language-to-text translation systems are similar to spoken language translation systems in that they consist of a recognition phase and a translation phase. First, the video of a person signing is transformed into a transcription of the signs, which is then translated into the text of a spoken language. One distinctive feature of sign languages is their multi-modal nature, as they can express meaning simultaneously via hand movements, body posture and facial expressions. In some sign languages, certain signs are accompanied by mouthings, i.e. the person silently pronounces the word while signing. In this work, we closely integrate a recognition and translation framework by adding a viseme recognizer (“lip reading system”) based on an active appearance model and by optimizing the recognition system to improve the translation output. The system outperforms the standard approach of separate recognition and translation.
pdf
bib
SIGNSPEAK: Scientific Understanding and Vision-based Technological Development for Continuous Sign Language Recognition and Translation
Jens Forster
|
Christoph Schmidt
|
Hermann Ney
Proceedings of Machine Translation Summit XIV: European projects
2012
pdf
bib
abs
RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus
Jens Forster
|
Christoph Schmidt
|
Thomas Hoyoux
|
Oscar Koller
|
Uwe Zelle
|
Justus Piater
|
Hermann Ney
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper introduces the RWTH-PHOENIX-Weather corpus, a video-based, large vocabulary corpus of German Sign Language suitable for statistical sign language recognition and translation. In contrastto most available sign language data collections, the RWTH-PHOENIX-Weather corpus has not been recorded for linguistic research but for the use in statistical pattern recognition. The corpus contains weather forecasts recorded from German public TV which are manually annotated using glosses distinguishing sign variants, and time boundaries have been marked on the sentence and the gloss level. Further, the spoken German weather forecast has been transcribed in a semi-automatic fashion using a state-of-the-art automatic speech recognition system. Moreover, an additional translation of the glosses into spoken German has been created to capture allowable translation variability. In addition to the corpus, experimental baseline results for hand and head tracking, statistical sign language recognition and translation are presented.
2011
pdf
bib
abs
The RWTH Aachen machine translation system for IWSLT 2011
Joern Wuebker
|
Matthias Huck
|
Saab Mansour
|
Markus Freitag
|
Minwei Feng
|
Stephan Peitz
|
Christoph Schmidt
|
Hermann Ney
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2011 is presented. We participated in the MT (English-French, Arabic-English, ChineseEnglish) and SLT (English-French) tracks. Both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated, including domain adaptation via monolingual and bilingual data selection, phrase training, different lexical smoothing methods, additional reordering models for the hierarchical system, various Arabic and Chinese segmentation methods, punctuation prediction for speech recognition output, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.
pdf
bib
The RWTH Aachen Machine Translation System for WMT 2011
Matthias Huck
|
Joern Wuebker
|
Christoph Schmidt
|
Markus Freitag
|
Stephan Peitz
|
Daniel Stein
|
Arnaud Dagnelies
|
Saab Mansour
|
Gregor Leusch
|
Hermann Ney
Proceedings of the Sixth Workshop on Statistical Machine Translation
2010
pdf
bib
abs
Sign language machine translation overkill
Daniel Stein
|
Christoph Schmidt
|
Hermann Ney
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers
Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques and do not adapt them to small-sized corpora. In this paper, we will propose new methods for common approaches like scaling factor optimization and alignment merging strategies which helped improve our baseline. We also conduct experiments with different decoders and employ state-of-the-art techniques like soft syntactic labels as well as trigger-based and discriminative word lexica and system combination. All methods are evaluated on one of the largest sign language corpora available.