Aiden Williams


2024

pdf bib
Towards a Corpus of Spoken Maltese: Korpus tal-Malti Mitkellem, KMM
Alexandra (Sandra) Vella | Sarah Agius | Aiden Williams | Claudia Borg
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents the rationale for a “dedicated” corpus of spoken Maltese, Korpus tal-Malti Mitkellem, KMM, ‘Corpus of Spoken Maltese’, based on the concept of a gold-standard Core collection. The Core collection is designed to cater to as wide a variety of user needs as possible whilst respecting basic principles governing corpus design, such as representativeness and balance, and delivering high quality in terms of both audio quality and annotations. An overview is provided of the composition of the current Core corpus of around 20 hours of data and of the human annotation effort involved. We also carry out a small qualitative analysis of the output of a Maltese ASR system and compare it to the human annotators’ output. Initial results are promising, showing that the ASR is robust enough to generate first-pass texts for annotators to work on, thus reducing the human effort, and consequently, the cost involved.

2023

pdf bib
UM-DFKI Maltese Speech Translation
Aiden Williams | Kurt Abela | Rishu Kumar | Martin Bär | Hannah Billinghurst | Kurt Micallef | Ahnaf Mozib Samin | Andrea DeMarco | Lonneke van der Plas | Claudia Borg
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

For the 2023 IWSLT Maltese Speech Translation Task, UM-DFKI jointly presents a cascade solution which achieves 0.6 BLEU. While this is the first time that a Maltese speech translation task has been released by IWSLT, this paper explores previous solutions for other speech translation tasks, focusing primarily on low-resource scenarios. Moreover, we present our method of fine-tuning XLS-R models for Maltese ASR using a collection of multi-lingual speech corpora as well as the fine-tuning of the mBART model for Maltese to English machine translation.