BabyFST - Towards a Finite-State Based Computational Model of Ancient Babylonian

Aleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lindén


Abstract
Akkadian is a fairly well resourced extinct language that does not yet have a comprehensive morphological analyzer available. In this paper we describe a general finite-state based morphological model for Babylonian, a southern dialect of the Akkadian language, that can achieve a coverage up to 97.3% and recall up to 93.7% on lemmatization and POS-tagging task on token level from a transcribed input. Since Akkadian word forms exhibit a high degree of morphological ambiguity, in that only 20.1% of running word tokens receive a single unambiguous analysis, we attempt a first pass at weighting our finite-state transducer, using existing extensive Akkadian corpora which have been partially validated for their lemmas and parts-of-speech but not the entire morphological analyses. The resultant weighted finite-state transducer yields a moderate improvement so that for 57.4% of the word tokens the highest ranked analysis is the correct one. We conclude with a short discussion on how morphological ambiguity in the analysis of Akkadian could be further reduced with improvements in the training data used in weighting the finite-state transducer as well as through other, context-based techniques.
Anthology ID:
2020.lrec-1.479
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3886–3894
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.479
DOI:
Bibkey:
Cite (ACL):
Aleksi Sahala, Miikka Silfverberg, Antti Arppe, and Krister Lindén. 2020. BabyFST - Towards a Finite-State Based Computational Model of Ancient Babylonian. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3886–3894, Marseille, France. European Language Resources Association.
Cite (Informal):
BabyFST - Towards a Finite-State Based Computational Model of Ancient Babylonian (Sahala et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.479.pdf