2015
pdf
bib
Source discriminative word lexicon for translation disambiguation
Teresa Herrmann
|
Jan Niehues
|
Alex Waibel
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers
pdf
bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2015
Eunah Cho
|
Thanh-Le Ha
|
Jan Niehues
|
Teresa Herrmann
|
Mohammed Mediani
|
Yuqi Zhang
|
Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation
2014
pdf
bib
abs
Manual Analysis of Structurally Informed Reordering in German-English Machine Translation
Teresa Herrmann
|
Jan Niehues
|
Alex Waibel
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Word reordering is a difficult task for translation. Common automatic metrics such as BLEU have problems reflecting improvements in target language word order. However, it is a crucial aspect for humans when deciding on translation quality. This paper presents a detailed analysis of a structure-aware reordering approach applied in a German-to-English phrase-based machine translation system. We compare the translation outputs of two translation systems applying reordering rules based on parts-of-speech and syntax trees on a sentence-by-sentence basis. For each sentence-pair we examine the global translation performance and classify local changes in the translated sentences. This analysis is applied to three data sets representing different genres. While the improvement in BLEU differed substantially between the data sets, the manual evaluation showed that both global translation performance as well as individual types of improvements and degradations exhibit a similar behavior throughout the three data sets. We have observed that for 55-64% of the sentences with different translations, the translation produced using the tree-based reordering was considered to be the better translation. As intended by the investigated reordering model, most improvements are achieved by improving the position of the verb or being able to translate a verb that could not be translated before.
pdf
bib
The KIT-LIMSI Translation System for WMT 2014
Quoc Khanh Do
|
Teresa Herrmann
|
Jan Niehues
|
Alexander Allauzen
|
François Yvon
|
Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation
pdf
bib
EU-BRIDGE MT: Combined Machine Translation
Markus Freitag
|
Stephan Peitz
|
Joern Wuebker
|
Hermann Ney
|
Matthias Huck
|
Rico Sennrich
|
Nadir Durrani
|
Maria Nadejde
|
Philip Williams
|
Philipp Koehn
|
Teresa Herrmann
|
Eunah Cho
|
Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation
pdf
bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2014
Teresa Herrmann
|
Mohammed Mediani
|
Eunah Cho
|
Thanh-Le Ha
|
Jan Niehues
|
Isabel Slawik
|
Yuqi Zhang
|
Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation
pdf
bib
abs
The KIT translation systems for IWSLT 2014
Isabel Slawik
|
Mohammed Mediani
|
Jan Niehues
|
Yuqi Zhang
|
Eunah Cho
|
Teresa Herrmann
|
Thanh-Le Ha
|
Alex Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we present the KIT systems participating in the TED translation tasks of the IWSLT 2014 machine translation evaluation. We submitted phrase-based translation systems for all three official directions, namely English→German, German→English, and English→French, as well as for the optional directions English→Chinese and English→Arabic. For the official directions we built systems both for the machine translation as well as the spoken language translation track. This year we improved our systems’ performance over last year through n-best list rescoring using neural network-based translation and language models and novel preordering rules based on tree information of multiple syntactic levels. Furthermore, we could successfully apply a novel phrase extraction algorithm and transliteration of unknown words for Arabic. We also submitted a contrastive system for German→English built with stemmed German adjectives. For the SLT tracks, we used a monolingual translation system to translate the lowercased ASR hypotheses with all punctuation stripped to truecased, punctuated output as a preprocessing step to our usual translation system.
2013
pdf
bib
abs
EU-BRIDGE MT: text translation of talks in the EU-BRIDGE project
Markus Freitag
|
Stephan Peitz
|
Joern Wuebker
|
Hermann Ney
|
Nadir Durrani
|
Matthias Huck
|
Philipp Koehn
|
Thanh-Le Ha
|
Jan Niehues
|
Mohammed Mediani
|
Teresa Herrmann
|
Alex Waibel
|
Nicola Bertoldi
|
Mauro Cettolo
|
Marcello Federico
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
EU-BRIDGE1 is a European research project which is aimed at developing innovative speech translation technology. This paper describes one of the collaborative efforts within EUBRIDGE to further advance the state of the art in machine translation between two European language pairs, English→French and German→English. Four research institutions involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the machine translation track of the evaluation campaign at the 2013 International Workshop on Spoken Language Translation (IWSLT). We present the methods and techniques to achieve high translation quality for text translation of talks which are applied at RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show how we have been able to considerably boost translation performance (as measured in terms of the metrics BLEU and TER) by means of system combination. The joint setups yield empirical gains of up to 1.4 points in BLEU and 2.8 points in TER on the IWSLT test sets compared to the best single systems.
pdf
bib
abs
The KIT translation systems for IWSLT 2013
Than-Le Ha
|
Teresa Herrmann
|
Jan Niehues
|
Mohammed Mediani
|
Eunah Cho
|
Yuqi Zhang
|
Isabel Slawik
|
Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we present the KIT systems participating in all three official directions, namely English→German, German→English, and English→French, in translation tasks of the IWSLT 2013 machine translation evaluation. Additionally, we present the results for our submissions to the optional directions English→Chinese and English→Arabic. We used phrase-based translation systems to generate the translations. This year, we focused on adapting the systems towards ASR input. Furthermore, we investigated different reordering models as well as an extended discriminative word lexicon. Finally, we added a data selection approach for domain adaptation.
pdf
bib
abs
Analyzing the potential of source sentence reordering in statistical machine translation
Teresa Herrmann
|
Jochen Weiner
|
Jan Niehues
|
Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers
We analyze the performance of source sentence reordering, a common reordering approach, using oracle experiments on German-English and English-German translation. First, we show that the potential of this approach is very promising. Compared to a monotone translation, the optimally reordered source sentence leads to improvements of up to 4.6 and 6.2 BLEU points, depending on the language. Furthermore, we perform a detailed evaluation of the different aspects of the approach. We analyze the impact of the restriction of the search space by reordering lattices and we can show that using more complex rule types for reordering results in better approximation of the optimally reordered source. However, a gap of about 3 to 3.8 BLEU points remains, presenting a promising perspective for research on extending the search space through better reordering rules. When evaluating the ranking of different reordering variants, the results reveal that the search for the best path in the lattice performs very well for German-English translation. For English-German translation there is potential for an improvement of up to 1.4 BLEU points through a better ranking of the different reordering possibilities in the reordering lattice.
pdf
bib
Combining Word Reordering Methods on different Linguistic Abstraction Levels for Statistical Machine Translation
Teresa Herrmann
|
Jan Niehues
|
Alex Waibel
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation
pdf
bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2013
Eunah Cho
|
Thanh-Le Ha
|
Mohammed Mediani
|
Jan Niehues
|
Teresa Herrmann
|
Isabel Slawik
|
Alex Waibel
Proceedings of the Eighth Workshop on Statistical Machine Translation
pdf
bib
Joint WMT 2013 Submission of the QUAERO Project
Stephan Peitz
|
Saab Mansour
|
Matthias Huck
|
Markus Freitag
|
Hermann Ney
|
Eunah Cho
|
Teresa Herrmann
|
Mohammed Mediani
|
Jan Niehues
|
Alex Waibel
|
Alexander Allauzen
|
Quoc Khanh Do
|
Bianka Buschbeck
|
Tonio Wandmacher
Proceedings of the Eighth Workshop on Statistical Machine Translation
2012
pdf
bib
Joint WMT 2012 Submission of the QUAERO Project
Markus Freitag
|
Stephan Peitz
|
Matthias Huck
|
Hermann Ney
|
Jan Niehues
|
Teresa Herrmann
|
Alex Waibel
|
Hai-son Le
|
Thomas Lavergne
|
Alexandre Allauzen
|
Bianka Buschbeck
|
Josep Maria Crego
|
Jean Senellart
Proceedings of the Seventh Workshop on Statistical Machine Translation
pdf
bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2012
Jan Niehues
|
Yuqi Zhang
|
Mohammed Mediani
|
Teresa Herrmann
|
Eunah Cho
|
Alex Waibel
Proceedings of the Seventh Workshop on Statistical Machine Translation
pdf
bib
abs
The KIT translation systems for IWSLT 2012
Mohammed Mediani
|
Yuqi Zhang
|
Thanh-Le Ha
|
Jan Niehues
|
Eunach Cho
|
Teresa Herrmann
|
Rainer Kärgel
|
Alexander Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we present the KIT systems participating in the English-French TED Translation tasks in the framework of the IWSLT 2012 machine translation evaluation. We also present several additional experiments on the English-German, English-Chinese and English-Arabic translation pairs. Our system is a phrase-based statistical machine translation system, extended with many additional models which were proven to enhance the translation quality. For instance, it uses the part-of-speech (POS)-based reordering, translation and language model adaptation, bilingual language model, word-cluster language model, discriminative word lexica (DWL), and continuous space language model. In addition to this, the system incorporates special steps in the preprocessing and in the post-processing step. In the preprocessing the noisy corpora are filtered by removing the noisy sentence pairs, whereas in the postprocessing the agreement between a noun and its surrounding words in the French translation is corrected based on POS tags with morphological information. Our system deals with speech transcription input by removing case information and punctuation except periods from the text translation model.
pdf
bib
abs
The KIT Lecture Corpus for Speech Translation
Sebastian Stüker
|
Florian Kraft
|
Christian Mohr
|
Teresa Herrmann
|
Eunah Cho
|
Alex Waibel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Academic lectures offer valuable content, but often do not reach their full potential audience due to the language barrier. Human translations of lectures are too expensive to be widely used. Speech translation technology can be an affordable alternative in this case. State-of-the-art speech translation systems utilize statistical models that need to be trained on large amounts of in-domain data. In order to support the KIT lecture translation project in its effort to introduce speech translation technology in KIT's lecture halls, we have collected a corpus of German lectures at KIT. In this paper we describe how we recorded the lectures and how we annotated them. We further give detailed statistics on the types of lectures in the corpus and its size. We collected the corpus with the purpose in mind that it should not just be suited for training a spoken language translation system the traditional way, but should also enable us to research techniques that enable the translation system to automatically and autonomously adapt itself to the varying topics and speakers of lectures
pdf
bib
abs
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Marcello Federico
|
Sebastian Stüker
|
Luisa Bentivogli
|
Michael Paul
|
Mauro Cettolo
|
Teresa Herrmann
|
Jan Niehues
|
Giovanni Moretti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series. That IWSLT 2011 evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike in previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 evaluation campaign, and describes the data supplied, the evaluation infrastructure made available to participants, and the subjective evaluation carried out.
2011
pdf
bib
abs
The KIT English-French translation systems for IWSLT 2011
Mohammed Mediani
|
Eunach Cho
|
Jan Niehues
|
Teresa Herrmann
|
Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper presents the KIT system participating in the English→French TALK Translation tasks in the framework of the IWSLT 2011 machine translation evaluation. Our system is a phrase-based translation system using POS-based reordering extended with many additional features. First of all, a special preprocessing is devoted to the Giga corpus in order to minimize the effect of the great amount of noise it contains. In addition, the system gives more importance to the in-domain data by adapting the translation and the language models as well as by using a wordcluster language model. Furthermore, the system is extended by a bilingual language model and a discriminative word lexicon. The automatic speech transcription input usually has no or wrong punctuation marks, therefore these marks were especially removed from the source training data for the SLT system training.
pdf
bib
Wider Context by Using Bilingual Language Models in Machine Translation
Jan Niehues
|
Teresa Herrmann
|
Stephan Vogel
|
Alex Waibel
Proceedings of the Sixth Workshop on Statistical Machine Translation
pdf
bib
Joint WMT Submission of the QUAERO Project
Markus Freitag
|
Gregor Leusch
|
Joern Wuebker
|
Stephan Peitz
|
Hermann Ney
|
Teresa Herrmann
|
Jan Niehues
|
Alex Waibel
|
Alexandre Allauzen
|
Gilles Adda
|
Josep Maria Crego
|
Bianka Buschbeck
|
Tonio Wandmacher
|
Jean Senellart
Proceedings of the Sixth Workshop on Statistical Machine Translation
pdf
bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011
Teresa Herrmann
|
Mohammed Mediani
|
Jan Niehues
|
Alex Waibel
Proceedings of the Sixth Workshop on Statistical Machine Translation
2010
pdf
bib
The Karlsruhe Institute for Technology Translation System for the ACL-WMT 2010
Jan Niehues
|
Teresa Herrmann
|
Mohammed Mediani
|
Alex Waibel
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
pdf
bib
The KIT translation system for IWSLT 2010
Jan Niehues
|
Mohammed Mediani
|
Teresa Herrmann
|
Michael Heck
|
Christian Herff
|
Alex Waibel
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
2009
pdf
bib
The Universität Karlsruhe Translation System for the EACL-WMT 2009
Jan Niehues
|
Teresa Herrmann
|
Muntsin Kolss
|
Alex Waibel
Proceedings of the Fourth Workshop on Statistical Machine Translation
2008
pdf
bib
Hybrid machine translation architectures within and beyond the EuroMatrix project
Andreas Eisele
|
Christian Federmann
|
Hans Uszkoreit
|
Hervé Saint-Amand
|
Martin Kay
|
Michael Jellinghaus
|
Sabine Hunsicker
|
Teresa Herrmann
|
Yu Chen
Proceedings of the 12th Annual Conference of the European Association for Machine Translation
pdf
bib
Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System
Andreas Eisele
|
Christian Federmann
|
Hervé Saint-Amand
|
Michael Jellinghaus
|
Teresa Herrmann
|
Yu Chen
Proceedings of the Third Workshop on Statistical Machine Translation