Paola Marongiu


2024

pdf bib
Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin
Iacopo Ghinassi | Simone Tedeschi | Paola Marongiu | Roberto Navigli | Barbara McGillivray
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Word Sense Disambiguation (WSD) is an important task in NLP, which serves the purpose of automatically disambiguating a polysemous word with its most likely sense in context. Recent studies have advanced the state of the art in this task, but most of the work has been carried out on contemporary English or other modern languages, leaving challenges posed by low-resource languages and diachronic change open. Although the problem with low-resource languages has recently been mitigated by using existing multilingual resources to propagate otherwise expensive annotations from English to other languages, such techniques have hitherto not been applied to historical languages such as Latin. In this work, we make the following two major contributions. First, we test such a strategy on a historical language and propose a new approach in this framework which makes use of existing bilingual corpora instead of native English datasets. Second, we fine-tune a Latin WSD model on the data produced and achieve state-of-the-art results on a standard benchmark for the task. Finally, we release the dataset generated with our approach, which is the largest dataset for Latin WSD to date. This work opens the door to further research, as our approach can be used for different historical and, generally, under-resourced languages.

pdf bib
LLODIA: A Linguistic Linked Open Data Model for Diachronic Analysis
Florentina Armaselu | Chaya Liebeskind | Paola Marongiu | Barbara McGillivray | Giedre Valunaite Oleskeviciene | Elena-Simona Apostol | Ciprian-Octavian Truica | Daniela Gifu
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024

This article proposes a linguistic linked open data model for diachronic analysis (LLODIA) that combines data derived from diachronic analysis of multilingual corpora with dictionary-based evidence. A humanities use case was devised as a proof of concept that includes examples in five languages (French, Hebrew, Latin, Lithuanian and Romanian) related to various meanings of the term “revolution” considered at different time intervals. The examples were compiled through diachronic word embedding and dictionary alignment.

2023

pdf bib
Graph Databases for Diachronic Language Data Modelling
Barbara McGillivray | Pierluigi Cassotti | Davide Di Pierro | Paola Marongiu | Anas Fahad Khan | Stefano Ferilli | Pierpaolo Basile
Proceedings of the 4th Conference on Language, Data and Knowledge

2018

pdf bib
Challenges in Converting the Index Thomisticus Treebank into Universal Dependencies
Flavio Massimiliano Cecchini | Marco Passarotti | Paola Marongiu | Daniel Zeman
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

This paper describes the changes applied to the original process used to convert the Index Thomisticus Treebank, a corpus including texts in Medieval Latin by Thomas Aquinas, into the annotation style of Universal Dependencies. The changes are made both to harmonise the Universal Dependencies version of the Index Thomisticus Treebank with the two other available Latin treebanks and to fix errors and inconsistencies resulting from the original process. The paper details the treatment of different issues in PoS tagging, lemmatisation and assignment of dependency relations. Finally, it assesses the quality of the new conversion process by providing an evaluation against a gold standard.

pdf bib
Enhancing Universal Dependency Treebanks: A Case Study
Joakim Nivre | Paola Marongiu | Filip Ginter | Jenna Kanerva | Simonetta Montemagni | Sebastian Schuster | Maria Simi
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies. We apply a rule-based system developed for English and a data-driven system trained on Finnish to Swedish and Italian. We find that both systems are accurate enough to bootstrap enhanced dependencies in existing UD treebanks. In the case of Italian, results are even on par with those of a prototype language-specific system.