Diego Moussallem


2024

pdf bib
Benchmarking Low-Resource Machine Translation Systems
Ana Silva | Nikit Srivastava | Tatiana Moteu Ngoli | Michael Röder | Diego Moussallem | Axel-Cyrille Ngonga Ngomo
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)

Assessing the performance of machine translation systems is of critical value, especially to languages with lower resource availability.Due to the large evaluation effort required by the translation task, studies often compare new systems against single systems or commercial solutions. Consequently, determining the best-performing system for specific languages is often unclear. This work benchmarks publicly available translation systems across 4 datasets and 26 languages, including low-resource languages. We consider both effectiveness and efficiency in our evaluation.Our results are made public through BENG—a FAIR benchmarking platform for Natural Language Generation tasks.

2020

pdf bib
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
Thiago Castro Ferreira | Claire Gardent | Nikolai Ilinykh | Chris van der Lee | Simon Mille | Diego Moussallem | Anastasia Shimorina
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

pdf bib
A General Benchmarking Framework for Text Generation
Diego Moussallem | Paramjot Kaur | Thiago Ferreira | Chris van der Lee | Anastasia Shimorina | Felix Conrads | Michael Röder | René Speck | Claire Gardent | Simon Mille | Nikolai Ilinykh | Axel-Cyrille Ngonga Ngomo
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

The RDF-to-text task has recently gained substantial attention due to the continuous growth of RDF knowledge graphs in number and size. Recent studies have focused on systematically comparing RDF-to-text approaches on benchmarking datasets such as WebNLG. Although some evaluation tools have already been proposed for text generation, none of the existing solutions abides by the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles and involves RDF data for the knowledge extraction task. In this paper, we present BENG, a FAIR benchmarking platform for Natural Language Generation (NLG) and Knowledge Extraction systems with focus on RDF data. BENG builds upon the successful benchmarking platform GERBIL, is opensource and is publicly available along with the data it contains.

pdf bib
The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020)
Thiago Castro Ferreira | Claire Gardent | Nikolai Ilinykh | Chris van der Lee | Simon Mille | Diego Moussallem | Anastasia Shimorina
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

WebNLG+ offers two challenges: (i) mapping sets of RDF triples to English or Russian text (generation) and (ii) converting English or Russian text to sets of RDF triples (semantic parsing). Compared to the eponymous WebNLG challenge, WebNLG+ provides an extended dataset that enable the training, evaluation, and comparison of microplanners and semantic parsers. In this paper, we present the results of the generation and semantic parsing task for both English and Russian and provide a brief description of the participating systems.

2019

pdf bib
A Holistic Natural Language Generation Framework for the Semantic Web
Axel-Cyrille Ngonga Ngomo | Diego Moussallem | Lorenz Bühmann
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

With the ever-growing generation of data for the Semantic Web comes an increasing demand for this data to be made available to non-semantic Web experts. One way of achieving this goal is to translate the languages of the Semantic Web into natural language. We present LD2NL, a framework that allows verbalizing the three key languages of the Semantic Web, i.e., RDF, OWL, and SPARQL. Our framework is based on a bottom-up approach to verbalization. We evaluated LD2NL in an open survey with 86 persons. Our results suggest that our framework can generate verbalizations that are close to natural languages and that can be easily understood by non-experts. Therewith, it enables non-domain experts to interpret Semantic Web data with more than 91% of the accuracy of domain experts.

2018

pdf bib
Enriching the WebNLG corpus
Thiago Castro Ferreira | Diego Moussallem | Emiel Krahmer | Sander Wubben
Proceedings of the 11th International Conference on Natural Language Generation

This paper describes the enrichment of WebNLG corpus (Gardent et al., 2017a,b), with the aim to further extend its usefulness as a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation. We also produce a silver-standard German translation of the corpus to enable the exploitation of NLG approaches to other languages than English. The enriched corpus is publicly available.

pdf bib
BENGAL: An Automatic Benchmark Generator for Entity Recognition and Linking
Axel-Cyrille Ngonga Ngomo | Michael Röder | Diego Moussallem | Ricardo Usbeck | René Speck
Proceedings of the 11th International Conference on Natural Language Generation

The manual creation of gold standards for named entity recognition and entity linking is time- and resource-intensive. Moreover, recent works show that such gold standards contain a large proportion of mistakes in addition to being difficult to maintain. We hence present Bengal, a novel automatic generation of such gold standards as a complement to manually created benchmarks. The main advantage of our benchmarks is that they can be readily generated at any time. They are also cost-effective while being guaranteed to be free of annotation errors. We compare the performance of 11 tools on benchmarks in English generated by Bengal and on 16 benchmarks created manually. We show that our approach can be ported easily across languages by presenting results achieved by 4 tools on both Brazilian Portuguese and Spanish. Overall, our results suggest that our automatic benchmark generation approach can create varied benchmarks that have characteristics similar to those of existing benchmarks. Our approach is open-source. Our experimental results are available at http://faturl.com/bengalexpinlg and the code at https://github.com/dice-group/BENGAL.

pdf bib
LIdioms: A Multilingual Linked Idioms Data Set
Diego Moussallem | Mohamed Ahmed Sherif | Diego Esteves | Marcos Zampieri | Axel-Cyrille Ngonga Ngomo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Diego Moussallem | Thiago Ferreira | Marcos Zampieri | Maria Claudia Cavalcanti | Geraldo Xexéo | Mariana Neves | Axel-Cyrille Ngonga Ngomo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
NeuralREG: An end-to-end approach to referring expression generation
Thiago Castro Ferreira | Diego Moussallem | Ákos Kádár | Sander Wubben | Emiel Krahmer
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extraction. Using a delexicalized version of the WebNLG corpus, we show that the neural model substantially improves over two strong baselines.