Daniel Ferrés

Also published as: Dani Ferrés


2022

pdf bib
ALEXSIS: A Dataset for Lexical Simplification in Spanish
Daniel Ferrés | Horacio Saggion
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Lexical Simplification is the process of reducing the lexical complexity of a text by replacing difficult words with easier to read (or understand) expressions while preserving the original information and meaning. In this paper we introduce ALEXSIS, a new dataset for this task, and we use ALEXSIS to benchmark Lexical Simplification systems in Spanish. The paper describes the evaluation of three kind of approaches to Lexical Simplification, a thesaurus-based approach, a single transformers-based approach, and a combination of transformers. We also report state of the art results on a previous Lexical Simplification dataset for Spanish.

pdf bib
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Sanja Štajner | Horacio Saggion | Daniel Ferrés | Matthew Shardlow | Kim Cheng Sheang | Kai North | Marcos Zampieri | Wei Xu
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

pdf bib
Controllable Lexical Simplification for English
Kim Cheng Sheang | Daniel Ferrés | Horacio Saggion
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Fine-tuning Transformer-based approaches have recently shown exciting results on sentence simplification task. However, so far, no research has applied similar approaches to the Lexical Simplification (LS) task. In this paper, we present ConLS, a Controllable Lexical Simplification system fine-tuned with T5 (a Transformer-based model pre-trained with a BERT-style approach and several other tasks). The evaluation results on three datasets (LexMTurk, BenchLS, and NNSeval) have shown that our model performs comparable to LSBert (the current state-of-the-art) and even outperforms it in some cases. We also conducted a detailed comparison on the effectiveness of control tokens to give a clear view of how each token contributes to the model.

pdf bib
Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification
Horacio Saggion | Sanja Štajner | Daniel Ferrés | Kim Cheng Sheang | Matthew Shardlow | Kai North | Marcos Zampieri
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English, Portuguese, and Spanish. A total of 14 teams submitted the results of their lexical simplification systems for the provided test data. Results of the shared task indicate new benchmarks in Lexical Simplification with English lexical simplification quantitative results noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.

2018

pdf bib
PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles
Daniel Ferrés | Horacio Saggion | Francesco Ronzano | Àlex Bravo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
An Adaptable Lexical Simplification Architecture for Major Ibero-Romance Languages
Daniel Ferrés | Horacio Saggion | Xavier Gómez Guinovart
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

Lexical Simplification is the task of reducing the lexical complexity of textual documents by replacing difficult words with easier to read (or understand) expressions while preserving the original meaning. The development of robust pipelined multilingual architectures able to adapt to new languages is of paramount importance in lexical simplification. This paper describes and evaluates a modular hybrid linguistic-statistical Lexical Simplifier that deals with the four major Ibero-Romance Languages: Spanish, Portuguese, Catalan, and Galician. The architecture of the system is the same for the four languages addressed, only the language resources used during simplification are language specific.

2007

pdf bib
Machine Learning with Semantic-Based Distances Between Sentences for Textual Entailment
Daniel Ferrés | Horacio Rodríguez
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Experiments Adapting an Open-Domain Question Answering System to the Geographical Domain Using Scope-Based Resources
Daniel Ferrés | Horacio Rodríguez
Proceedings of the Workshop on Multilingual Question Answering - MLQA ‘06

2004

pdf bib
Automatic Classification of Geographic Named Entities
Daniel Ferrés | Marc Massot | Muntsa Padró | Horacio Rodríguez | Jordi Turmo
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Automatic Building Gazetteers of Co-referring Named Entities
Daniel Ferrés | Marc Massot | Muntsa Padró | Horacio Rodríguez | Jordi Turmo
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Senseval-3: The Spanish lexical sample task
Lluis Màrquez | Mariona Taulé | Antonia Martí | Núria Artigas | Mar García | Francis Real | Dani Ferrés
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Senseval-3: The Catalan lexical sample task
Lluis Màrquez | Mariona Taulé | Antonia Martí | Mar García | Francis Real | Dani Ferrés
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text