DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model

Ondřej Herman, Vít Suchomel, Vít Baisa, Pavel Rychlý


Abstract
In this paper we investigate two approaches to discrimination of similar languages: Expectation–maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6% and 88.3% on set A of the DSL Shared task 2016 competition.
Anthology ID:
W16-4815
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
114–118
Language:
URL:
https://aclanthology.org/W16-4815
DOI:
Bibkey:
Cite (ACL):
Ondřej Herman, Vít Suchomel, Vít Baisa, and Pavel Rychlý. 2016. DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 114–118, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation–Maximization and Chunk-based Language Model (Herman et al., VarDial 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4815.pdf