Knowledge Discovery in COVID-19 Research Literature

Ernesto L. Estevanell-Valladares, Suilan Estevez-Velarde, Alejandro Piad-Morffis, Yoan Gutierrez, Andres Montoyo, Rafael Muñoz, Yudivián Almeida Cruz


Abstract
This paper presents the preliminary results of an ongoing project that analyzes the growing body of scientific research published around the COVID-19 pandemic. In this research, a general-purpose semantic model is used to double annotate a batch of 500 sentences that were manually selected from the CORD-19 corpus. Afterwards, a baseline text-mining pipeline is designed and evaluated via a large batch of 100,959 sentences. We present a qualitative analysis of the most interesting facts automatically extracted and highlight possible future lines of development. The preliminary results show that general-purpose semantic models are a useful tool for discovering fine-grained knowledge in large corpora of scientific documents.
Anthology ID:
2021.ranlp-1.46
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
402–410
Language:
URL:
https://aclanthology.org/2021.ranlp-1.46
DOI:
Bibkey:
Cite (ACL):
Ernesto L. Estevanell-Valladares, Suilan Estevez-Velarde, Alejandro Piad-Morffis, Yoan Gutierrez, Andres Montoyo, Rafael Muñoz, and Yudivián Almeida Cruz. 2021. Knowledge Discovery in COVID-19 Research Literature. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 402–410, Held Online. INCOMA Ltd..
Cite (Informal):
Knowledge Discovery in COVID-19 Research Literature (Estevanell-Valladares et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.46.pdf
Data
CORD-19