Nathan Hill


2023

pdf bib
Representing and Computing Uncertainty in Phonological Reconstruction
Johann-Mattis List | Nathan Hill | Robert Forkel | Frederic Blum
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Despite the inherently fuzzy nature of reconstructions in historical linguistics, most scholars do not represent their uncertainty when proposing proto-forms. With the increasing success of recently proposed approaches to automating certain aspects of the traditional comparative method, the formal representation of proto-forms has also improved. This formalization makes it possible to address both the representation and the computation of uncertainty. Building on recent advances in supervised phonological reconstruction, during which an algorithm learns how to reconstruct words in a given proto-language relying on previously annotated data, and inspired by improved methods for automated word prediction from cognate sets, we present a new framework that allows for the representation of uncertainty in linguistic reconstruction and also includes a workflow for the computation of fuzzy reconstructions from linguistic data.

2022

pdf bib
A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns
Johann-Mattis List | Robert Forkel | Nathan Hill
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed. Despite these increased efforts, there are not many easy-to-use and fast approaches for the task of phonological reconstruction. Here we present a new framework that combines state-of-the-art techniques for automated sequence comparison with novel techniques for phonetic alignment analysis and sound correspondence pattern detection to allow for the supervised reconstruction of word forms in ancestral languages. We test the method on a new dataset covering six groups from three different language families. The results show that our method yields promising results while at the same time being not only fast but also easy to apply and expand.

pdf bib
The SIGTYP 2022 Shared Task on the Prediction of Cognate Reflexes
Johann-Mattis List | Ekaterina Vylomova | Robert Forkel | Nathan Hill | Ryan Cotterell
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

This study describes the structure and the results of the SIGTYP 2022 shared task on the prediction of cognate reflexes from multilingual wordlists. We asked participants to submit systems that would predict words in individual languages with the help of cognate words from related languages. Training and surprise data were based on standardized multilingual wordlists from several language families. Four teams submitted a total of eight systems, including both neural and non-neural systems, as well as systems adjusted to the task and systems using more general settings. While all systems showed a rather promising performance, reflecting the overwhelming regularity of sound change, the best performance throughout was achieved by a system based on convolutional networks originally designed for image restoration.

pdf bib
NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties
Christian Faggionato | Nathan Hill | Marieke Meelen
Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference

In this paper we present our work-in-progress on a fully-implemented pipeline to create deeply-annotated corpora of a number of historical and contemporary Tibetan and Newar varieties. Our off-the-shelf tools allow researchers to create corpora with five different layers of annotation, ranging from morphosyntactic to information-structural annotation. We build on and optimise existing tools (in line with FAIR principles), as well as develop new ones, and show how they can be adapted to other Tibetan and Newar languages, most notably modern endangered languages that are both extremely low-resourced and under-researched.

2021

pdf bib
User-friendly Automatic Transcription of Low-resource Languages: Plugging ESPnet into Elpis
Oliver Adams | Benjamin Galliot | Guillaume Wisniewski | Nicholas Lambourne | Ben Foley | Rahasya Sanders-Dwyer | Janet Wiles | Alexis Michaud | Séverine Guillaume | Laurent Besacier | Christopher Cox | Katya Aplonova | Guillaume Jacques | Nathan Hill
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)