Gilles Sérasset

Also published as: Gilles Serasset


2024

pdf bib
KGAST: From Knowledge Graphs to Annotated Synthetic Texts
Nakanyseth Vuth | Gilles Sérasset | Didier Schwab
Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024)

In recent years, the use of synthetic data, either as a complement or a substitute for original data, has emerged as a solution to challenges such as data scarcity and security risks. This paper is an initial attempt to automatically generate such data for Information Extraction tasks. We accomplished this by developing a novel synthetic data generation framework called KGAST, which leverages Knowledge Graphs and Large Language Models. In our preliminary study, we conducted simple experiments to generate synthetic versions of two datasets—a French security defense dataset and an English general domain dataset, after which we evaluated them both intrinsically and extrinsically. The results indicated that synthetic data can effectively complement original data, improving the performance of models on classes with limited training samples. This highlights KGAST’s potential as a tool for generating synthetic data for Information Extraction tasks.

pdf bib
Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024
Gilles Sérasset | Hugo Gonçalo Oliveira | Giedre Valunaite Oleskeviciene
Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024

pdf bib
Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC
Christian Chiarcos | Ranka Stanković | Maxim Ionov | Gilles Sérasset
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

OntoLex, the dominant community standard for machine-readable lexical resources in the context of RDF, Linked Data and Semantic Web technologies, is currently extended with a designated module for Frequency, Attestations and Corpus-based Information (OntoLex-FrAC). We propose a novel component for OntoLex-FrAC, addressing the incorporation of corpus queries for (a) linking dictionaries with corpus engines, (b) enabling RDF-based web services to exchange corpus queries and responses data dynamically, and (c) using conventional query languages to formalize the internal structure of collocations, word sketches, and colligations. The primary field of application of the query extension is in digital lexicography and corpus linguistics, and we present a proof-of-principle implementation in backend components of a novel platform designed to support digital lexicography for the Serbian language.

pdf bib
From Linguistic Linked Data to Big Data
Dimitar Trajanov | Elena Apostol | Radovan Garabik | Katerina Gkirtzou | Dagmar Gromann | Chaya Liebeskind | Cosimo Palma | Michael Rosner | Alexia Sampri | Gilles Sérasset | Blerina Spahiu | Ciprian-Octavian Truică | Giedre Valunaite Oleskeviciene
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

With advances in the field of Linked (Open) Data (LOD), language data on the LOD cloud has grown in number, size, and variety. With an increased volume and variety of language data, optimizations of methods for distributing, storing, and querying these data become more central. To this end, this position paper investigates use cases at the intersection of LLOD and Big Data, existing approaches to utilizing Big Data techniques within the context of linked data, and discusses the challenges and benefits of this union.

pdf bib
MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
Dagmar Gromann | Hugo Goncalo Oliveira | Lucia Pitarch | Elena-Simona Apostol | Jordi Bernad | Eliot Bytyçi | Chiara Cantone | Sara Carvalho | Francesca Frontini | Radovan Garabik | Jorge Gracia | Letizia Granata | Fahad Khan | Timotej Knez | Penny Labropoulou | Chaya Liebeskind | Maria Pia Di Buono | Ana Ostroški Anić | Sigita Rackevičienė | Ricardo Rodrigues | Gilles Sérasset | Linas Selmistraitis | Mahammadou Sidibé | Purificação Silvano | Blerina Spahiu | Enriketa Sogutlu | Ranka Stanković | Ciprian-Octavian Truică | Giedre Valunaite Oleskeviciene | Slavko Zitnik | Katerina Zdravkova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.

pdf bib
On Modelling Corpus Citations in Computational Lexical Resources
Fahad Khan | Maxim Ionov | Christian Chiarcos | Laurent Romary | Gilles Sérasset | Besim Kabashi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this article we look at how two different standards for lexical resources, TEI and OntoLex, deal with corpus citations in lexicons. We will focus on how corpus citations in retrodigitised dictionaries can be modelled using each of the two standards since this provides us with a suitably challenging use case. After looking at the structure of an example entry from a legacy dictionary, we examine the two approaches offered by the two different standards by outlining an encoding for the example entry using both of them (note that this article features the first extended discussion of how the Frequency Attestation and Corpus (FrAC) module of OntoLex deals with citations). After comparing the two approaches and looking at the advantages and disadvantages of both, we argue for a combination of both. In the last part of the article we discuss different ways of doing this, giving our preference for a strategy which makes use of RDFa.

2023

pdf bib
Enriching Multiword Terms in Wiktionary with Pronunciation Information
Lenka Bajcetic | Thierry Declerck | Gilles Sérasset
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

We report on work in progress dealing with the automated generation of pronunciation information for English multiword terms (MWTs) in Wiktionary, combining information available for their single components. We describe the issues we were encountering, the building of an evaluation dataset, and our teaming with the DBnary resource maintainer. Our approach shows potential for automatically adding morphosyntactic and semantic information to the components of such MWTs.

pdf bib
Leveraging DBnary Data to Enrich Information of Multiword Terms in Wiktionary
Gilles Sérasset | Thierry Declerck | Lenka Bajčetić
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
DBnary2Vec: Preliminary Study on Lexical Embeddings for Downstream NLP Tasks
Nakanyseth Vuth | Gilles Sérasset
Proceedings of the 4th Conference on Language, Data and Knowledge

2022

pdf bib
A Cheap and Dirty Cross-Lingual Linking Service in the Cloud
Christian Chiarcos | Gilles Sérasset
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

In this paper, we describe the application of Linguistic Linked Open Data (LLOD) technology for dynamic cross-lingual querying on demand. Whereas most related research is focusing on providing a static linking, i.e., cross-lingual inference, and then storing the resulting links, we demonstrate the application of the federation capabilities of SPARQL to perform lexical linking on the fly. In the end, we provide a baseline functionality that uses the connection of two web services – a SPARQL end point for multilingual lexical data and another SPARQL end point for querying an English language knowledge graph – in order to perform querying an English language knowledge graph using foreign language labels. We argue that, for low-resource languages where substantial native knowledge graphs are lacking, this functionality can be used to lower the language barrier by allowing to formulate cross-linguistically applicable queries mediated by a multilingual dictionary.

pdf bib
Cross-Lingual Link Discovery for Under-Resourced Languages
Michael Rosner | Sina Ahmadi | Elena-Simona Apostol | Julia Bosque-Gil | Christian Chiarcos | Milan Dojchinovski | Katerina Gkirtzou | Jorge Gracia | Dagmar Gromann | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Gilles Sérasset | Ciprian-Octavian Truică
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We rst introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We de ne under-resourced languages with a speci c focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.

2015

pdf bib
METEOR for multiple target languages using DBnary
Zied Elloumi | Hervé Blanchon | Gilles Serasset | Laurent Besacier
Proceedings of Machine Translation Summit XV: Papers

2014

pdf bib
Word Sense Induction for Lexical Resource Enrichment (Induction de sens pour enrichir des ressources lexicales) [in French]
Mohammad Nasiruddin | Didier Schwab | Andon Tchechmedjiev | Gilles Sérasset | Hervé Blanchon
Proceedings of TALN 2014 (Volume 2: Short Papers)

pdf bib
Lexical Networks, Natural Language Processing and Linked Open Data (Réseaux Lexicaux, Traitement des Langues, et Données Liées Ouvertes) [in French]
Gilles Sérasset
TALN-RECITAL 2014 Workshop RLTLN 2014 : Réseaux Lexicaux pour le TAL (RLTLN 2014 : Lexical Networks for NLP)

2013

pdf bib
GETALP System : Propagation of a Lesk Measure through an Ant Colony Algorithm
Didier Schwab | Andon Tchechmedjiev | Jérôme Goulian | Mohammad Nasiruddin | Gilles Sérasset | Hervé Blanchon
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)
Cyril Grouin | Dominic Forest | Gilles Sérasset
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)

pdf bib
JEP-TALN-RECITAL 2012, Workshop DEGELS 2012: Défi GEste Langue des Signes (DEGELS 2012: Gestures and Sign Language Challenge)
Annelies Braffort | Leïla Boutora | Gilles Sérasset
JEP-TALN-RECITAL 2012, Workshop DEGELS 2012: Défi GEste Langue des Signes (DEGELS 2012: Gestures and Sign Language Challenge)

pdf bib
JEP-TALN-RECITAL 2012, Workshop TALAf 2012: Traitement Automatique des Langues Africaines (TALAf 2012: African Language Processing)
Chantal Enguehard | Mathieu Mangeot | Gilles Sérasset
JEP-TALN-RECITAL 2012, Workshop TALAf 2012: Traitement Automatique des Langues Africaines (TALAf 2012: African Language Processing)

pdf bib
JEP-TALN-RECITAL 2012, Workshop ILADI 2012: Interactions Langagières pour personnes Agées Dans les habitats Intelligents (ILADI 2012: Language Interaction for Elderly in Smart Homes)
François Portet | Michel Vacher | Gilles Sérasset
JEP-TALN-RECITAL 2012, Workshop ILADI 2012: Interactions Langagières pour personnes Agées Dans les habitats Intelligents (ILADI 2012: Language Interaction for Elderly in Smart Homes)

pdf bib
Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based probabilistic WSD algorithm
Andon Tchechmedjiev | Jérôme Goulian | Didier Schwab | Gilles Sérasset
Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology

pdf bib
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP
Laurent Besacier | Benjamin Lecouteux | Gilles Sérasset
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf bib
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN
Georges Antoniadis | Hervé Blanchon | Gilles Sérasset
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 3: RECITAL
Jorge Mauricio Molina Mejia | Didier Schwab | Gilles Sérasset
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 3: RECITAL

pdf bib
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 4: Invited Conferences
Laurent Besacier | Hervé Blanchon | Marie-Paule Jacques | Nathalie Vallée | Gilles Sérasset
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 4: Invited Conferences

pdf bib
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 5: Software Demonstrations
Laurent Besacier | Hervé Blanchon | Gilles Sérasset
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 5: Software Demonstrations

pdf bib
Dbnary: Wiktionary as a LMF based Multilingual RDF network
Gilles Sérasset
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Contributive resources, such as wikipedia, have proved to be valuable in Natural Language Processing or Multilingual Information Retrieval applications. This article focusses on Wiktionary, the dictionary part of the collaborative resources sponsored by the Wikimedia foundation. In this article we present a word net that has been extracted from French, English and German wiktionaries. We present the structure of this word net and discuss the specific extraction problems induced by this kind of contributive resources and the method used to overcome them. Then we show how we represent the extracted data as a Lexical Markup Framework (LMF) compatible lexical network represented in Resource Description Framework (RDF) format.

2006

pdf bib
Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Andreas Witt | Gilles Sérasset | Susan Armstrong | Jim Breen | Ulrich Heid | Felix Sasaki
Proceedings of the Workshop on Multilingual Language Resources and Interoperability

pdf bib
The LexALP Information System: Term Bank and Corpus for Multilingual Legal Terminology Consolidated
Verena Lyding | Elena Chiocchetti | Gilles Sérasset | Francis Brunet-Manquat
Proceedings of the Workshop on Multilingual Language Resources and Interoperability

pdf bib
Multilingual Legal Terminology on the Jibiki Platform: The LexALP Project
Gilles Sérasset | Francis Brunet-Manquat | Elena Chiocchetti
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Création d’une base terminologique juridique multilingue à l’aide de la plateforme générique Jibiki : le projet LexALP
Francis Brunet-Manquat | Gilles Sérasset
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article présente l’utilisation de « Jibiki » (la plateforme de développement du serveur Web Papillon) dans le cadre du projet LexALP1. Le but de ce projet est d’harmoniser la terminologie des quatre langues (français, allemand, italien et slovène) de la Convention Alpine2 de sorte que les états membres puissent coopérer efficacement. Pour cela, le projet utilise la plateforme Jibiki afin de construire une banque terminologique permettant de comparer la terminologie spécialisée de sept systèmes légaux dans quatre langues, et de l’harmoniser, optimisant ainsi la compréhension entre les états alpins sur des questions environnementales au niveau supranational. Dans cet article, nous présentons comment peut être employée la plateforme générique Jibiki afin de gérer un dictionnaire particulier.

2004

pdf bib
A Generic Collaborative Platform for Multilingual Lexical Database Development
Gilles Sérasset
Proceedings of the Workshop on Multilingual Linguistic Resources

2002

pdf bib
Frameworks, Implementation and Open Problems for the Collaborative Building of a Multilingual Lexical Database
Mathieu Mangeot-Lerebours | Gilles Sérasset | Frédéric Andrès
COLING-02: SEMANET: Building and Using Semantic Networks

pdf bib
The PAPILLON Project: Cooperatively Building a Multilingual Lexical Data-base to Derive Open Source Dictionaries & Lexicons
Christian Boitet | Mathieu Mangeot | Gilles Sérasset
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)

2000

pdf bib
On UNL as the future “html of the linguistic content” & the reuse of existing NLP components in UNL-related applications with the example of a UNL-French deconverter
Gilles Sérasset | Christian Boitet
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1999

pdf bib
UNL-French deconversion as transfer & generation from an interlingua with possible quality enhancement through offline human interaction
Gilles Sérasset | Christian Boitet
Proceedings of Machine Translation Summit VII

We present the architecture of the UNL-French deconverter, which "generates" from the UNL interlingua by first "localizing" the UNL form for French, within UNL, and then applying slightly adapted but classical transfer and generation techniques, implemented in GETA's Ariane-G5 environment, supplemented by some UNL-specific tools. Online interaction can be used during deconversion to enhance output quality and is now used for development purposes. We show how interaction could be delayed and embedded in the postedition phase, which would then interact not directly with the output text, but indirectly with several components of the deconverter. Interacting online or offline can improve the quality not only of the utterance at hand, but also of the utterances processed later, as various preferences may be automatically changed to let the deconverter "learn".

1994

pdf bib
lnterlinguai Lexical Organisation for Multilingual Lexical Databases in NADIA
Gilles Serasset
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics