Document retrieval and question answering in medical documents. A large-scale corpus challenge.

Curea Eric


Abstract
Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations.
Anthology ID:
W17-8001
Volume:
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Svetla Boytcheva, Kevin Bretonnel Cohen, Guergana Savova, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1–7
Language:
URL:
https://doi.org/10.26615/978-954-452-044-1_001
DOI:
10.26615/978-954-452-044-1_001
Bibkey:
Cite (ACL):
Curea Eric. 2017. Document retrieval and question answering in medical documents. A large-scale corpus challenge.. In Proceedings of the Biomedical NLP Workshop associated with RANLP 2017, pages 1–7, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Document retrieval and question answering in medical documents. A large-scale corpus challenge. (Eric, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-044-1_001