COVID-19 Mythbusters in World Languages

Mana Ashida, Jin-Dong Kim, Seunghun Lee


Abstract
This paper introduces a multi-lingual database containing translated texts of COVID-19 mythbusters. The database has translations into 115 languages as well as the original English texts, of which the original texts are published by World Health Organization (WHO). This paper then presents preliminary analyses on latin-alphabet-based texts to see the potential of the database as a resource for multilingual linguistic analyses. The analyses on latin-alphabet-based texts gave interesting insights into the resource. While the amount of translated texts in each language was small, character bi-grams with normalization (lowercasing and removal of diacritics) was turned out to be an effective proxy for measuring the similarity of the languages, and the affinity ranking of language pairs could be obtained. Additionally, the hierarchical clustering analysis is performed using the character bigram overlap ratio of every possible pair of languages. The result shows the cluster of Germanic languages, Romance languages, and Southern Bantu languages. In sum, the multilingual database not only offers fixed set of materials in numerous languages, but also serves as a preliminary tool to identify the language family using text-based similarity measure of bigram overlap ratio.
Anthology ID:
2022.lrec-1.326
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3048–3055
Language:
URL:
https://aclanthology.org/2022.lrec-1.326
DOI:
Bibkey:
Cite (ACL):
Mana Ashida, Jin-Dong Kim, and Seunghun Lee. 2022. COVID-19 Mythbusters in World Languages. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3048–3055, Marseille, France. European Language Resources Association.
Cite (Informal):
COVID-19 Mythbusters in World Languages (Ashida et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.326.pdf