Evaluating Data Augmentation for Medication Identification in Clinical Notes

Jordan Koontz, Maite Oronoz, Alicia Pérez


Abstract
We evaluate the effectiveness of using data augmentation to improve the generalizability of a Named Entity Recognition model for the task of medication identification in clinical notes. We compare disparate data augmentation methods, namely mention-replacement and a generative model, for creating synthetic training examples. Through experiments on the n2c2 2022 Track 1 Contextualized Medication Event Extraction data set, we show that data augmentation with supplemental examples created with GPT-3 can boost the performance of a transformer-based model for small training sets.
Anthology ID:
2023.ranlp-1.63
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
578–585
Language:
URL:
https://aclanthology.org/2023.ranlp-1.63
DOI:
Bibkey:
Cite (ACL):
Jordan Koontz, Maite Oronoz, and Alicia Pérez. 2023. Evaluating Data Augmentation for Medication Identification in Clinical Notes. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 578–585, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Evaluating Data Augmentation for Medication Identification in Clinical Notes (Koontz et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.63.pdf