PICO Corpus: A Publicly Available Corpus to Support Automatic Data Extraction from Biomedical Literature

Faith Mutinda, Kongmeng Liew, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki


Abstract
We present a publicly available corpus with detailed annotations describing the core elements of clinical trials: Participants, Intervention, Control, and Outcomes. The corpus consists of 1011 abstracts of breast cancer randomized controlled trials extracted from the PubMed database. The corpus improves previous corpora by providing detailed annotations for outcomes to identify numeric texts that report the number of participants that experience specific outcomes. The corpus will be helpful for the development of systems for automatic extraction of data from randomized controlled trial literature to support evidence-based medicine. Additionally, we demonstrate the feasibility of the corpus by using two strong baselines for named entity recognition task. Most of the entities achieve F1 scores greater than 0.80 demonstrating the quality of the dataset.
Anthology ID:
2022.wiesp-1.4
Volume:
Proceedings of the first Workshop on Information Extraction from Scientific Publications
Month:
November
Year:
2022
Address:
Online
Editors:
Tirthankar Ghosal, Sergi Blanco-Cuaresma, Alberto Accomazzi, Robert M. Patton, Felix Grezes, Thomas Allen
Venue:
WIESP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–31
Language:
URL:
https://aclanthology.org/2022.wiesp-1.4
DOI:
Bibkey:
Cite (ACL):
Faith Mutinda, Kongmeng Liew, Shuntaro Yada, Shoko Wakamiya, and Eiji Aramaki. 2022. PICO Corpus: A Publicly Available Corpus to Support Automatic Data Extraction from Biomedical Literature. In Proceedings of the first Workshop on Information Extraction from Scientific Publications, pages 26–31, Online. Association for Computational Linguistics.
Cite (Informal):
PICO Corpus: A Publicly Available Corpus to Support Automatic Data Extraction from Biomedical Literature (Mutinda et al., WIESP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wiesp-1.4.pdf