Lluis Formiga

Also published as: Lluís Formiga


2013

pdf bib
The TALP-UPC Phrase-Based Translation Systems for WMT13: System Combination with Morphology Generation, Domain Adaptation and Corpus Filtering
Lluís Formiga | Marta R. Costa-jussà | José B. Mariño | José A. R. Fonollosa | Alberto Barrón-Cedeño | Lluís Màrquez
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
The TALP-UPC Approach to System Selection: Asiya Features and Pairwise Classification Using Random Forests
Lluís Formiga | Meritxell Gonzàlez | Alberto Barrón-Cedeño | José A. R. Fonollosa | Lluís Màrquez
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Real-life Translation Quality Estimation for MT System Selection
Lluis Formiga | Lluis Marquez | Jaume Pujantell
Proceedings of Machine Translation Summit XIV: Papers

2012

pdf bib
The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation
Lluís Formiga | Carlos A. Henríquez Q. | Adolfo Hernández | José B. Mariño | Enric Monte | José A. R. Fonollosa
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Dealing with Input Noise in Statistical Machine Translation
Lluis Formiga | Jose A. R. Fonollosa
Proceedings of COLING 2012: Posters

pdf bib
The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output
Daniele Pighin | Lluís Màrquez | Lluís Formiga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a corpus consisting of 11,292 real-world English to Spanish automatic translations annotated with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The translation requests, collected through the popular translation portal http://reverso.net, provide a most variated sample of real-world machine translation (MT) usage, from complete sentences to units of one or two words, from well-formed to hardly intelligible texts, from technical documents to colloquial and slang snippets. In this paper, we present 1) a preliminary annotation experiment that we carried out to select the most appropriate quality criterion to be used for these data, 2) a graph-based methodology inspired by Interactive Genetic Algorithms to reduce the annotation effort, and 3) the outcomes of the full-scale annotation experiment, which result in a valuable and original resource for the analysis and characterization of MT-output quality.

pdf bib
A Graph-based Strategy to Streamline Translation Quality Assessments
Daniele Pighin | Lluís Formiga | Lluís Màrquez
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

We present a detailed analysis of a graph-based annotation strategy that we employed to annotate a corpus of 11,292 real-world English to Spanish automatic translations with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The proposed approach, inspired by previous work in Interactive Evolutionary Computation and Interactive Genetic Algorithms, results in a simpler and faster annotation process. We empirically compare the method against a traditional, explicit ranking approach, and show that the graph-based strategy: 1) is considerably faster, and 2) produces consistently more reliable annotations.

pdf bib
Improving English to Spanish Out-of-Domain Translations by Morphology Generalization and Generation
Lluís Formiga | Adolfo Hernández | José B. Mariño | Enric Monte
Workshop on Monolingual Machine Translation

This paper presents a detailed study of a method for morphology generalization and generation to address out-of-domain translations in English-to-Spanish phrase-based MT. The paper studies whether the morphological richness of the target language causes poor quality translation when translating out-of-domain. In detail, this approach first translates into Spanish simplified forms and then predicts the final inflected forms through a morphology generation step based on shallow and deep-projected linguistic information available from both the source and target-language sentences. Obtained results highlight the importance of generalization, and therefore generation, for dealing with out-of-domain data.