Experiments in Cuneiform Language Identification

Gustavo Henrique Paetzold, Marcos Zampieri


Abstract
This paper presents methods to discriminate between languages and dialects written in Cuneiform script, one of the first writing systems in the world. We report the results obtained by the PZ team in the Cuneiform Language Identification (CLI) shared task organized within the scope of the VarDial Evaluation Campaign 2019. The task included two languages, Sumerian and Akkadian. The latter is divided into six dialects: Old Babylonian, Middle Babylonian peripheral, Standard Babylonian, Neo Babylonian, Late Babylonian, and Neo Assyrian. We approach the task using a meta-classifier trained on various SVM models and we show the effectiveness of the system for this task. Our submission achieved 0.738 F1 score in discriminating between the seven languages and dialects and it was ranked fourth in the competition among eight teams.
Anthology ID:
W19-1423
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Editors:
Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
209–213
Language:
URL:
https://aclanthology.org/W19-1423
DOI:
10.18653/v1/W19-1423
Bibkey:
Cite (ACL):
Gustavo Henrique Paetzold and Marcos Zampieri. 2019. Experiments in Cuneiform Language Identification. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 209–213, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
Experiments in Cuneiform Language Identification (Paetzold & Zampieri, VarDial 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1423.pdf