2024
pdf
bib
abs
SemEval-2024 Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
Timothee Mickus
|
Elaine Zosa
|
Raul Vazquez
|
Teemu Vahtola
|
Jörg Tiedemann
|
Vincent Segonne
|
Alessandro Raganato
|
Marianna Apidianaki
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases of overgeneration put in jeopardy many NLG applications, where correctness is often mission-critical. The shared task was conducted with a newly constructed dataset of 4000 model outputs labeled by 5 annotators each, spanning 3 NLP tasks: machine translation, paraphrase generation and definition modeling.The shared task was tackled by a total of 58 different users grouped in 42 teams, out of which 26 elected to write a system description paper; collectively, they submitted over 300 prediction sets on both tracks of the shared task. We observe a number of key trends in how this approach was tackled—many participants rely on a handful of model, and often rely either on synthetic data for fine-tuning or zero-shot prompting strategies. While a majority of the teams did outperform our proposed baseline system, the performances of top-scoring systems are still consistent with a random handling of the more challenging items.
pdf
bib
abs
MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki
Timothee Mickus
|
Stig-Arne Grönroos
|
Joseph Attieh
|
Michele Boggia
|
Ona De Gibert
|
Shaoxiong Ji
|
Niki Andreas Loppi
|
Alessandro Raganato
|
Raúl Vázquez
|
Jörg Tiedemann
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
NLP in the age of monolithic large language models is approaching its limits in terms of size and information that can be handled. The trend goes to modularization, a necessary step into the direction of designing smaller sub-networks and components with specialized functionality. In this paper, we present the MAMMOTH toolkit: a framework designed for training massively multilingual modular machine translation systems at scale, initially derived from OpenNMT-py and then adapted to ensure efficient training across computation clusters.We showcase its efficiency across clusters of A100 and V100 NVIDIA GPUs, and discuss our design philosophy and plans for future information.The toolkit is publicly available online at https://github.com/Helsinki-NLP/mammoth.
pdf
bib
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)
Raúl Vázquez
|
Hande Celikkanat
|
Dennis Ulmer
|
Jörg Tiedemann
|
Swabha Swayamdipta
|
Wilker Aziz
|
Barbara Plank
|
Joris Baan
|
Marie-Catherine de Marneffe
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)
pdf
bib
Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)
Raúl Vázquez
|
Timothee Mickus
|
Jörg Tiedemann
|
Ivan Vulić
|
Ahmet Üstün
Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)
pdf
bib
abs
Findings of the AmericasNLP 2024 Shared Task on Machine Translation into Indigenous Languages
Abteen Ebrahimi
|
Ona de Gibert
|
Raul Vazquez
|
Rolando Coto-Solano
|
Pavel Denisov
|
Robert Pugh
|
Manuel Mager
|
Arturo Oncevay
|
Luis Chiruzzo
|
Katharina von der Wense
|
Shruti Rijhwani
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)
This paper presents the findings of the third iteration of the AmericasNLP Shared Task on Machine Translation. This year’s competition features eleven Indigenous languages found across North, Central, and South America. A total of six teams participate with a total of 157 submissions across all languages and models. Two baselines – the Sheffield and Helsinki systems from 2023 – are provided and represent hard-to-beat starting points for the competition. In addition to the baselines, teams are given access to a new repository of training data which consists of data collected by teams in prior shared tasks. Using ChrF++ as the main competition metric, we see improvements over the baseline for 4 languages: Chatino, Guarani, Quechua, and Rarámuri, with performance increases over the best baseline of 4.2 ChrF++. In this work, we present a summary of the submitted systems, results, and a human evaluation of system outputs for Bribri, which consists of both (1) a rating of meaning and fluency and (2) a qualitative error analysis of outputs from the best submitted system.
pdf
bib
abs
I Have an Attention Bridge to Sell You: Generalization Capabilities of Modular Translation Architectures
Timothee Mickus
|
Raul Vazquez
|
Joseph Attieh
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Modularity is a paradigm of machine translation with the potential of bringing forth models that are large at training time and small during inference. Within this field of study, modular approaches, and in particular attention bridges, have been argued to improve the generalization capabilities of models by fostering language-independent representations. In the present paper, we study whether modularity affects translation quality; as well as how well modular architectures generalize across different evaluation scenarios. For a given computational budget, we find non-modular architectures to be always comparable or preferable to all modular designs we study.
2023
pdf
bib
abs
Why Bother with Geometry? On the Relevance of Linear Decompositions of Transformer Embeddings
Timothee Mickus
|
Raúl Vázquez
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
A recent body of work has demonstrated that Transformer embeddings can be linearly decomposed into well-defined sums of factors, that can in turn be related to specific network inputs or components. There is however still a dearth of work studying whether these mathematical reformulations are empirically meaningful. In the present work, we study representations from machine-translation decoders using two of such embedding decomposition methods. Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question. The high variability of our measurements indicate that geometry reflects model-specific characteristics more than it does sentence-specific computations, and that similar training conditions do not guarantee similar vector spaces.
pdf
bib
abs
Four Approaches to Low-Resource Multilingual NMT: The Helsinki Submission to the AmericasNLP 2023 Shared Task
Ona De Gibert
|
Raúl Vázquez
|
Mikko Aulamo
|
Yves Scherrer
|
Sami Virpioja
|
Jörg Tiedemann
Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
The Helsinki-NLP team participated in the AmericasNLP 2023 Shared Task with 6 submissions for all 11 language pairs arising from 4 different multilingual systems. We provide a detailed look at the work that went into collecting and preprocessing the data that led to our submissions. We explore various setups for multilingual Neural Machine Translation (NMT), namely knowledge distillation and transfer learning, multilingual NMT including a high-resource language (English), language-specific fine-tuning, and multilingual NMT exclusively using low-resource data. Our multilingual Model B ranks first in 4 out of the 11 language pairs.
pdf
bib
abs
Dozens of Translation Directions or Millions of Shared Parameters? Comparing Two Types of Multilinguality in Modular Machine Translation
Michele Boggia
|
Stig-Arne Grönroos
|
Niki Loppi
|
Timothee Mickus
|
Alessandro Raganato
|
Jörg Tiedemann
|
Raúl Vázquez
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
There are several ways of implementing multilingual NLP systems but little consensus as to whether different approaches exhibit similar effects. Are the trends that we observe when adding more languages the same as those we observe when sharing more parameters? We focus on encoder representations drawn from modular multilingual machine translation systems in an English-centric scenario, and study their quality from multiple aspects: how adequate they are for machine translation, how independent of the source language they are, and what semantic information they convey. Adding translation directions in English-centric scenarios does not conclusively lead to an increase in translation quality. Shared layers increase performance on zero-shot translation pairs and lead to more language-independent representations, but these improvements do not systematically align with more semantically accurate representations, from a monolingual standpoint.
2022
pdf
bib
abs
A Closer Look at Parameter Contributions When Training Neural Language and Translation Models
Raúl Vázquez
|
Hande Celikkanat
|
Vinit Ravishankar
|
Mathias Creutz
|
Jörg Tiedemann
Proceedings of the 29th International Conference on Computational Linguistics
We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function. In other words, we can observe the contributions of different network components at training time. In this article, we systematically study masked language modeling, causal language modeling, and machine translation. We show that the choice of training objective leads to distinctive optimization procedures, even when performed on comparable Transformer architectures. We demonstrate how the various Transformer parameters are used during training, supporting that the feed-forward components of each layer are the main contributors to the optimization procedure. Finally, we find that the learning dynamics are not affected by data size and distribution but rather determined by the learning objective.
pdf
bib
abs
Latest Development in the FoTran Project – Scaling Up Language Coverage in Neural Machine Translation Using Distributed Training with Language-Specific Components
Raúl Vázquez
|
Michele Boggia
|
Alessandro Raganato
|
Niki A. Loppi
|
Stig-Arne Grönroos
|
Jörg Tiedemann
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
We describe the enhancement of a multilingual NMT toolkit developed as part of the FoTran project. We devise our modular attention-bridge model, which connects language-specific components through a shared network layer. The system now supports distributed training over many nodes and GPUs in order to substantially scale up the number of languages that can be included in a modern neural translation architecture. The model enables the study of emerging language-agnostic representations and also provides a modular toolkit for efficient machine translation.
2021
pdf
bib
abs
The Helsinki submission to the AmericasNLP shared task
Raúl Vázquez
|
Yves Scherrer
|
Sami Virpioja
|
Jörg Tiedemann
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
The University of Helsinki participated in the AmericasNLP shared task for all ten language pairs. Our multilingual NMT models reached the first rank on all language pairs in track 1, and first rank on nine out of ten language pairs in track 2. We focused our efforts on three aspects: (1) the collection of additional data from various sources such as Bibles and political constitutions, (2) the cleaning and filtering of training data with the OpusFilter toolkit, and (3) different multilingual training techniques enabled by the latest version of the OpenNMT-py toolkit to make the most efficient use of the scarce data. This paper describes our efforts in detail.
pdf
bib
abs
An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation
Alessandro Raganato
|
Raúl Vázquez
|
Mathias Creutz
|
Jörg Tiedemann
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Zero-shot translations is a fascinating feature of Multilingual Neural Machine Translation (MNMT) systems. These MNMT models are usually trained on English-centric data, i.e. English either as the source or target language, and with a language label prepended to the input indicating the target language. However, recent work has highlighted several flaws of these models in zero-shot scenarios where language labels are ignored and the wrong language is generated or different runs show highly unstable results. In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label. We compare and evaluate several MNMT systems on three multilingual MT benchmarks of different sizes, showing that simply supervising one cross attention head to focus both on word alignments and language labels reduces the bias towards translating into the wrong language, improving the zero-shot performance overall. Moreover, as an additional advantage, we find that our alignment supervision leads to more stable results across different training runs.
pdf
bib
abs
On the differences between BERT and MT encoder spaces and how to address them in translation tasks
Raúl Vázquez
|
Hande Celikkanat
|
Mathias Creutz
|
Jörg Tiedemann
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop
Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to improve the applicability of BERT in MT.
2020
pdf
bib
abs
A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation
Raúl Vázquez
|
Alessandro Raganato
|
Mathias Creutz
|
Jörg Tiedemann
Computational Linguistics, Volume 46, Issue 2 - June 2020
Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.
pdf
bib
abs
The University of Helsinki Submission to the IWSLT2020 Offline SpeechTranslation Task
Raúl Vázquez
|
Mikko Aulamo
|
Umut Sulubacak
|
Jörg Tiedemann
Proceedings of the 17th International Conference on Spoken Language Translation
This paper describes the University of Helsinki Language Technology group’s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text. In line with this year’s task objective, we train both cascade and end-to-end systems for spoken language translation. We opt for an end-to-end multitasking architecture with shared internal representations and a cascade approach that follows a standard procedure consisting of ASR, correction, and MT stages. We also describe the experiments that served as a basis for the submitted systems. Our experiments reveal that multitasking training with shared internal representations is not only possible but allows for knowledge-transfer across modalities.
2019
pdf
bib
abs
An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation
Alessandro Raganato
|
Raúl Vázquez
|
Mathias Creutz
|
Jörg Tiedemann
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks. We systematically study the impact of the size of the shared layer and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that the performance in translation does correlate with trainable downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. On the other hand, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. We hypothesize that the training procedure on the downstream task enables the model to identify the encoded information that is useful for the specific task whereas non-trainable benchmarks can be confused by other types of information also encoded in the representation of a sentence.
pdf
bib
abs
Multilingual NMT with a Language-Independent Attention Bridge
Raúl Vázquez
|
Alessandro Raganato
|
Jörg Tiedemann
|
Mathias Creutz
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
In this paper, we propose an architecture for machine translation (MT) capable of obtaining multilingual sentence representations by incorporating an intermediate attention bridge that is shared across all languages. We train the model with language-specific encoders and decoders that are connected through an inner-attention layer on the encoder side. The attention bridge exploits the semantics from each language for translation and develops into a language-agnostic meaning representation that can efficiently be used for transfer learning. We present a new framework for the efficient development of multilingual neural machine translation (NMT) using this model and scheduled training. We have tested the approach in a systematic way with a multi-parallel data set. The model achieves substantial improvements over strong bilingual models and performs well for zero-shot translation, which demonstrates its ability of abstraction and transfer learning.
pdf
bib
abs
The University of Helsinki Submissions to the WMT19 News Translation Task
Aarne Talman
|
Umut Sulubacak
|
Raúl Vázquez
|
Yves Scherrer
|
Sami Virpioja
|
Alessandro Raganato
|
Arvi Hurskainen
|
Jörg Tiedemann
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
In this paper we present the University of Helsinki submissions to the WMT 2019 shared news translation task in three language pairs: English-German, English-Finnish and Finnish-English. This year we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German we trained both sentence-level transformer models as well as compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches and we also included a rule-based system for English-Finnish.
pdf
bib
abs
The University of Helsinki Submissions to the WMT19 Similar Language Translation Task
Yves Scherrer
|
Raúl Vázquez
|
Sami Virpioja
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
This paper describes the University of Helsinki Language Technology group’s participation in the WMT 2019 similar language translation task. We trained neural machine translation models for the language pairs Czech <-> Polish and Spanish <-> Portuguese. Our experiments focused on different subword segmentation methods, and in particular on the comparison of a cognate-aware segmentation method, Cognate Morfessor, with character segmentation and unsupervised segmentation methods for which the data from different languages were simply concatenated. We did not observe major benefits from cognate-aware segmentation methods, but further research may be needed to explore larger parts of the parameter space. Character-level models proved to be competitive for translation between Spanish and Portuguese, but they are slower in training and decoding.
pdf
bib
abs
The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task
Raúl Vázquez
|
Umut Sulubacak
|
Jörg Tiedemann
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
This paper describes the University of Helsinki Language Technology group’s participation in the WMT 2019 parallel corpus filtering task. Our scores were produced using a two-step strategy. First, we individually applied a series of filters to remove the ‘bad’ quality sentences. Then, we produced scores for each sentence by weighting these features with a classification model. This methodology allowed us to build a simple and reliable system that is easily adaptable to other language pairs.
2018
pdf
bib
abs
The MeMAD Submission to the WMT18 Multimodal Translation Task
Stig-Arne Grönroos
|
Benoit Huet
|
Mikko Kurimo
|
Jorma Laaksonen
|
Bernard Merialdo
|
Phu Pham
|
Mats Sjöberg
|
Umut Sulubacak
|
Jörg Tiedemann
|
Raphael Troncy
|
Raúl Vázquez
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.