Carlos Escolano


2024

pdf bib
The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs
Aleix Sant | Carlos Escolano | Audrey Mash | Francesca De Luca Fornaciari | Maite Melero
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

This paper studies gender bias in machine translation through the lens of Large Language Models (LLMs). Four widely-used test sets are employed to benchmark various base LLMs, comparing their translation quality and gender bias against state-of-the-art Neural Machine Translation (NMT) models for English to Catalan (En → Ca) and English to Spanish (En → Es) translation directions. Our findings reveal pervasive gender bias across all models, with base LLMs exhibiting a higher degree of bias compared to NMT models.To combat this bias, we explore prompting engineering techniques applied to an instruction-tuned LLM. We identify a prompt structure that significantly reduces gender bias by up to 12% on the WinoMT evaluation dataset compared to more straightforward prompts. These results significantly reduce the gender bias accuracy gap between LLMs and traditional NMT systems.

pdf bib
Residual Dropout: A Simple Approach to Improve Transformer’s Data Efficiency
Carlos Escolano | Francesca De Luca Fornaciari | Maite Melero
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

Transformer models often demand a vast amount of training data to achieve the desired level of performance. However, this data requirement poses a major challenge for low-resource languages seeking access to high-quality systems, particularly in tasks like Machine Translation. To address this issue, we propose adding Dropout to Transformer’s Residual Connections. Our experimental results demonstrate that this modification effectively mitigates overfitting during training, resulting in substantial performance gains of over 4 BLEU points on a dataset consisting of merely 10 thousand examples.

pdf bib
ReSeTOX: Re-learning attention weights for toxicity mitigation in machine translation
Javier García Gilabert | Carlos Escolano | Marta Costa-jussà
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

Our proposed method, RESETOX (REdoSEarch if TOXic), addresses the issue ofNeural Machine Translation (NMT) gener-ating translation outputs that contain toxicwords not present in the input. The ob-jective is to mitigate the introduction oftoxic language without the need for re-training. In the case of identified addedtoxicity during the inference process, RE-SETOX dynamically adjusts the key-valueself-attention weights and re-evaluates thebeam search hypotheses. Experimental re-sults demonstrate that RESETOX achievesa remarkable 57% reduction in added tox-icity while maintaining an average trans-lation quality of 99.5% across 164 lan-guages. Our code is available at: https://github.com

pdf bib
BSC Submission to the AmericasNLP 2024 Shared Task
Javier Garcia Gilabert | Aleix Sant | Carlos Escolano | Francesca De Luca Fornaciari | Audrey Mash | Maite Melero
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)

This paper describes the BSC’s submission to the AmericasNLP 2024 Shared Task. We participated in the Spanish to Quechua and Spanish to Guarani tasks. In this paper we show that by using LoRA adapters we can achieve similar performance as a full parameter fine-tuning by only training 14.2% of the total number of parameters. Our systems achieved the highest ChrF++ scores and ranked first for both directions in the final results outperforming strong baseline systems in the provided development and test datasets.

pdf bib
Unmasking Biases: Exploring Gender Bias in English-Catalan Machine Translation through Tokenization Analysis and Novel Dataset
Audrey Mash | Carlos Escolano | Aleix Sant | Maite Melero | Francesca de Luca Fornaciari
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a comprehensive evaluation of gender bias in English-Catalan machine translation, encompassing the creation of a novel language resource and an analysis of translation quality across four different tokenization models. The study introduces a new dataset derived from the MuST-SHE corpus, focusing on gender-neutral terms that necessitate gendered translations in Catalan. The results reveal noteworthy gender bias across all translation models, with a consistent preference for masculine forms. Notably, the study finds that when context is available, BPE and Sentencepiece Unigram tokenization methods outperform others, achieving higher accuracy in gender translation. However, when no context is provided, Morfessor outputs more feminine forms than other tokenization methods, albeit still a small percentage. The study also reflects that stereotypes present in the data are amplified in the translation output. Ultimately, this work serves as a valuable resource for addressing and mitigating gender bias in machine translation, emphasizing the need for improved awareness and sensitivity to gender issues in natural language processing applications.

2023

pdf bib
TALP-UPC at ProbSum 2023: Fine-tuning and Data Augmentation Strategies for NER
Neil Torrero | Gerard Sant | Carlos Escolano
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

This paper describes the submission of the TALP-UPC team to the Problem List Summarization task from the BioNLP 2023 workshop. This task consists of automatically extracting a list of health issues from the e-health medical record of a given patient. Our submission combines additional steps of data annotationwith finetuning of BERT pre-trained language models. Our experiments focus on the impact of finetuning on different datasets as well as the addition of data augmentation techniques to delay overfitting.

pdf bib
Toxicity in Multilingual Machine Translation at Scale
Marta Costa-jussà | Eric Smith | Christophe Ropers | Daniel Licht | Jean Maillard | Javier Ferrando | Carlos Escolano
Findings of the Association for Computational Linguistics: EMNLP 2023

Machine Translation systems can produce different types of errors, some of which are characterized as critical or catastrophic due to the specific negative impact that they can have on users. In this paper we focus on one type of critical error: added toxicity. We evaluate and analyze added toxicity when translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences, covering 13 demographic axes) from English into 164 languages. An automatic toxicity evaluation shows that added toxicity across languages varies from 0% to 5%. The output languages with the most added toxicity tend to be low-resource ones, and the demographic axes with the most added toxicity include sexual orientation, gender and sex, and ability. We also perform human evaluation on a subset of 8 translation directions, confirming the prevalence of true added toxicity. We use a measurement of the amount of source contribution to the translation, where a low source contribution implies hallucination, to interpret what causes toxicity. Making use of the input attributions allows us to explain toxicity, because the source contributions significantly correlate with toxicity for 84% of languages studied. Given our findings, our recommendations to reduce added toxicity are to curate training data to avoid mistranslations, mitigate hallucination and check unstable translations.

2022

pdf bib
Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer
Javier Ferrando | Gerard I. Gállego | Belen Alastruey | Carlos Escolano | Marta R. Costa-jussà
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens’ attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks input tokens’ attributions for both contexts. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.

pdf bib
Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022
Ioannis Tsiamas | Gerard I. Gállego | Carlos Escolano | José Fonollosa | Marta R. Costa-jussà
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper describes the submissions of the UPC Machine Translation group to the IWSLT 2022 Offline Speech Translation and Speech-to-Speech Translation tracks. The offline task involves translating English speech to German, Japanese and Chinese text. Our Speech Translation systems are trained end-to-end and are based on large pretrained speech and text models. We use an efficient fine-tuning technique that trains only specific layers of our system, and explore the use of adapter modules for the non-trainable layers. We further investigate the suitability of different speech encoders (wav2vec 2.0, HuBERT) for our models and the impact of knowledge distillation from the Machine Translation model that we use for the decoder (mBART). For segmenting the IWSLT test sets we fine-tune a pretrained audio segmentation model and achieve improvements of 5 BLEU compared to the given segmentation. Our best single model uses HuBERT and parallel adapters and achieves 29.42 BLEU at English-German MuST-C tst-COMMON and 26.77 at IWSLT 2020 test. By ensembling many models, we further increase translation quality to 30.83 BLEU and 27.78 accordingly. Furthermore, our submission for English-Japanese achieves 15.85 and English-Chinese obtains 25.63 BLEU on the MuST-C tst-COMMON sets. Finally, we extend our system to perform English-German Speech-to-Speech Translation with a pretrained Text-to-Speech model.

2021

pdf bib
Multi-Task Learning for Improving Gender Accuracy in Neural Machine Translation
Carlos Escolano | Graciela Ojeda | Christine Basta | Marta R. Costa-jussa
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Machine Translation is highly impacted by social biases present in data sets, indicating that it reflects and amplifies stereotypes. In this work, we study mitigating gender bias by jointly learning the translation, the part-of-speech, and the gender of the target language with different morphological complexity. This approach has shown improvements up to 6.8 points in gender accuracy without significantly impacting the translation quality.

pdf bib
Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation
Jordi Armengol-Estapé | Marta R. Costa-jussà | Carlos Escolano
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Introducing factors, that is to say, word features such as linguistic information referring to the source tokens, is known to improve the results of neural machine translation systems in certain settings, typically in recurrent architectures. This study proposes enhancing the current state-of-the-art neural machine translation architecture, the Transformer, so that it allows to introduce external knowledge. In particular, our proposed modification, the Factored Transformer, uses linguistic factors that insert additional knowledge into the machine translation system. Apart from using different kinds of features, we study the effect of different architectural configurations. Specifically, we analyze the performance of combining words and features at the embedding level or at the encoder level, and we experiment with two different combination strategies. With the best-found configuration, we show improvements of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task. Moreover, we experiment with the more challenging FLoRes English-to-Nepali benchmark, which includes both extremely low-resourced and very distant languages, and obtain an improvement of 1.2 BLEU

pdf bib
End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021
Gerard I. Gállego | Ioannis Tsiamas | Carlos Escolano | José A. R. Fonollosa | Marta R. Costa-jussà
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation system, which combines pre-trained models (Wav2Vec 2.0 and mBART) with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique, which trains only 20% of its total parameters. We show that adding an Adapter to the system and pre-training it, can increase the convergence speed and the final result, with which we achieve a BLEU score of 27.3 on the MuST-C test set. Our final model is an ensemble that obtains 28.22 BLEU score on the same set. Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2.0 for identifying periods of untranscribable text and can bring improvements of 2.5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.

pdf bib
Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders
Carlos Escolano | Marta R. Costa-jussà | José A. R. Fonollosa | Mikel Artetxe
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages. In this paper, we propose an alternative approach that is based on language-specific encoder-decoders, and can thus be more easily extended to new languages by learning their corresponding modules. So as to encourage a common interlingua representation, we simultaneously train the N initial languages. Our experiments show that the proposed approach outperforms the universal encoder-decoder by 3.28 BLEU points on average, while allowing to add new languages without the need to retrain the rest of the modules. All in all, our work closes the gap between shared and language-specific encoderdecoders, advancing toward modular multilingual machine translation systems that can be flexibly extended in lifelong learning settings.

pdf bib
The TALP-UPC Participation in WMT21 News Translation Task: an mBART-based NMT Approach
Carlos Escolano | Ioannis Tsiamas | Christine Basta | Javier Ferrando | Marta R. Costa-jussa | José A. R. Fonollosa
Proceedings of the Sixth Conference on Machine Translation

This paper describes the submission to the WMT 2021 news translation shared task by the UPC Machine Translation group. The goal of the task is to translate German to French (De-Fr) and French to German (Fr-De). Our submission focuses on fine-tuning a pre-trained model to take advantage of monolingual data. We fine-tune mBART50 using the filtered data, and additionally, we train a Transformer model on the same data from scratch. In the experiments, we show that fine-tuning mBART50 results in 31.69 BLEU for De-Fr and 23.63 BLEU for Fr-De, which increases 2.71 and 1.90 BLEU accordingly, as compared to the model we train from scratch. Our final submission is an ensemble of these two models, further increasing 0.3 BLEU for Fr-De.

2020

pdf bib
The TALP-UPC System Description for WMT20 News Translation Task: Multilingual Adaptation for Low Resource MT
Carlos Escolano | Marta R. Costa-jussà | José A. R. Fonollosa
Proceedings of the Fifth Conference on Machine Translation

In this article, we describe the TALP-UPC participation in the WMT20 news translation shared task for Tamil-English. Given the low amount of parallel training data, we resort to adapt the task to a multilingual system to benefit from the positive transfer from high resource languages. We use iterative backtranslation to fine-tune the system and benefit from the monolingual data available. In order to measure the effectivity of such methods, we compare our results to a bilingual baseline system.

2019

pdf bib
From Bilingual to Multilingual Neural Machine Translation by Incremental Training
Carlos Escolano | Marta R. Costa-jussà | José A. R. Fonollosa
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Multilingual Neural Machine Translation approaches are based on the use of task specific models and the addition of one more language can only be done by retraining the whole system. In this work, we propose a new training schedule that allows the system to scale to more languages without modification of the previous components based on joint training and language-independent encoder/decoder modules allowing for zero-shot translation. This work in progress shows close results to state-of-the-art in the WMT task.

pdf bib
Multilingual, Multi-scale and Multi-layer Visualization of Intermediate Representations
Carlos Escolano | Marta R. Costa-jussà | Elora Lacroux | Pere-Pau Vázquez
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

The main alternatives nowadays to deal with sequences are Recurrent Neural Networks (RNN) architectures and the Transformer. In this context, Both RNN’s and Transformer have been used as an encoder-decoder architecture with multiple layers in each module. Far beyond this, these architectures are the basis for the contextual word embeddings which are revolutionizing most natural language downstream applications. However, intermediate representations in either the RNN or Transformer architectures can be difficult to interpret. To make these layer representations more accessible and meaningful, we introduce a web-based tool that visualizes them both at the sentence and token level. We present three use cases. The first analyses gender issues in contextual word embeddings. The second and third are showing multilingual intermediate representations for sentences and tokens and the evolution of these intermediate representations along with the multiple layers of the decoder and in the context of multilingual machine translation.

pdf bib
The TALP-UPC Machine Translation Systems for WMT19 News Translation Task: Pivoting Techniques for Low Resource MT
Noe Casas | José A. R. Fonollosa | Carlos Escolano | Christine Basta | Marta R. Costa-jussà
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this article, we describe the TALP-UPC research group participation in the WMT19 news translation shared task for Kazakh-English. Given the low amount of parallel training data, we resort to using Russian as pivot language, training subword-based statistical translation systems for Russian-Kazakh and Russian-English that were then used to create two synthetic pseudo-parallel corpora for Kazakh-English and English-Kazakh respectively. Finally, a self-attention model based on the decoder part of the Transformer architecture was trained on the two pseudo-parallel corpora.

2018

pdf bib
The TALP-UPC Machine Translation Systems for WMT18 News Shared Translation Task
Noe Casas | Carlos Escolano | Marta R. Costa-jussà | José A. R. Fonollosa
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

In this article we describe the TALP-UPC research group participation in the WMT18 news shared translation task for Finnish-English and Estonian-English within the multi-lingual subtrack. All of our primary submissions implement an attention-based Neural Machine Translation architecture. Given that Finnish and Estonian belong to the same language family and are similar, we use as training data the combination of the datasets of both language pairs to paliate the data scarceness of each individual pair. We also report the translation quality of systems trained on individual language pair data to serve as baseline and comparison reference.

2017

pdf bib
Byte-based Neural Machine Translation
Marta R. Costa-jussà | Carlos Escolano | José A. R. Fonollosa
Proceedings of the First Workshop on Subword and Character Level Models in NLP

This paper presents experiments comparing character-based and byte-based neural machine translation systems. The main motivation of the byte-based neural machine translation system is to build multi-lingual neural machine translation systems that can share the same vocabulary. We compare the performance of both systems in several language pairs and we see that the performance in test is similar for most language pairs while the training time is slightly reduced in the case of byte-based neural machine translation.

pdf bib
The TALP-UPC Neural Machine Translation System for German/Finnish-English Using the Inverse Direction Model in Rescoring
Carlos Escolano | Marta R. Costa-jussà | José A. R. Fonollosa
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
The TALPUPC Spanish–English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System
Marta R. Costa-jussà | Cristina España-Bonet | Pranava Madhyastha | Carlos Escolano | José A. R. Fonollosa
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers