2012
pdf
bib
abs
Towards a better understanding of statistical post-editing
Marion Potet
|
Laurent Besacier
|
Hervé Blanchon
|
Marwen Azouzi
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
We describe several experiments to better understand the usefulness of statistical post-edition (SPE) to improve phrase-based statistical MT (PBMT) systems raw outputs. Whatever the size of the training corpus, we show that SPE systems trained on general domain data offers no breakthrough to our baseline general domain PBMT system. However, using manually post-edited system outputs to train the SPE led to a slight improvement in the translations quality compared with the use of professional reference translations. We also show that SPE is far more effective for domain adaptation, mainly because it recovers a lot of specific terms unknown to our general PBMT system. Finally, we compare two domain adaptation techniques, post-editing a general domain PBMT system vs building a new domain-adapted PBMT system with two different techniques, and show that the latter outperforms the first one. Yet, when the PBMT is a “black box”, SPE trained with post-edited system outputs remains an interesting option for domain adaptation.
pdf
bib
abs
Collection of a Large Database of French-English SMT Output Corrections
Marion Potet
|
Emmanuelle Esperança-Rodier
|
Laurent Besacier
|
Hervé Blanchon
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Corpus-based approaches to machine translation (MT) rely on the availability of parallel corpora. To produce user-acceptable translation outputs, such systems need high quality data to be efficiency trained, optimized and evaluated. However, building high quality dataset is a relatively expensive task. In this paper, we describe the data collection and analysis of a large database of 10.881 SMT translation output hypotheses manually corrected. These post-editions were collected using Amazon's Mechanical Turk, following some ethical guidelines. A complete analysis of the collected data pointed out a high quality of the corrections with more than 87 % of the collected post-editions that improve hypotheses and more than 94 % of the crowdsourced post-editions which are at least of professional quality. We also post-edited 1,500 gold-standard reference translations (of bilingual parallel corpora generated by professional) and noticed that 72 % of these translations needed to be corrected during post-edition. We computed a proximity measure between the differents kind of translations and pointed out that reference translations are as far from the hypotheses than from the corrected hypotheses (i.e. the post-editions). In light of these last findings, we discuss the adequation of text-based generated reference translations to train setence-to-sentence based SMT systems.
2011
pdf
bib
The LIGA (LIG/LIA) Machine Translation System for WMT 2011
Marion Potet
|
Raphaël Rubino
|
Benjamin Lecouteux
|
Stéphane Huet
|
Laurent Besacier
|
Hervé Blanchon
|
Fabrice Lefèvre
Proceedings of the Sixth Workshop on Statistical Machine Translation
pdf
bib
Oracle-based Training for Phrase-based Statistical Machine Translation
Marion Potet
|
Emmanuelle Esperança-Rodier
|
Hervé Blanchon
|
Laurent Besacier
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
2010
pdf
bib
The LIG Machine Translation System for WMT 2010
Marion Potet
|
Laurent Besacier
|
Hervé Blanchon
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
pdf
bib
LIG statistical machine translation systems for IWSLT 2010
Laurent Besacier
|
Haitem Afli
|
Thi Ngoc Diep Do
|
Hervé Blanchon
|
Marion Potet
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
2009
pdf
bib
abs
Méta-moteur de traduction automatique : proposition d’une métrique pour le classement de traductions
Marion Potet
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues
Compte tenu de l’essor du Web et du développement des documents multilingues, le besoin de traductions “à la volée” est devenu une évidence. Cet article présente un système qui propose, pour une phrase donnée, non pas une unique traduction, mais une liste de N hypothèses de traductions en faisant appel à plusieurs moteurs de traduction pré-existants. Neufs moteurs de traduction automatique gratuits et disponibles sur leWeb ont été sélectionnés pour soumettre un texte à traduire et réceptionner sa traduction. Les traductions obtenues sont classées selon une métrique reposant sur l’utilisation d’un modèle de langage. Les expériences conduites ont montré que ce méta-moteur de traduction se révèle plus pertinent que l’utilisation d’un seul système de traduction.