When Is a Name Sensitive? Eponyms in Clinical Text and Implications for De-Identification

Thomas Vakili, Tyr Hullmann, Aron Henriksson, Hercules Dalianis


Abstract
Clinical data, in the form of electronic health records, are rich resources that can be tapped using natural language processing. At the same time, they contain very sensitive information that must be protected. One strategy is to remove or obscure data using automatic de-identification. However, the detection of sensitive data can yield false positives. This is especially true for tokens that are similar in form to sensitive entities, such as eponyms. These names tend to refer to medical procedures or diagnoses rather than specific persons. Previous research has shown that automatic de-identification systems often misclassify eponyms as names, leading to a loss of valuable medical information. In this study, we estimate the prevalence of eponyms in a real Swedish clinical corpus. Furthermore, we demonstrate that modern transformer-based de-identification systems are more accurate in distinguishing between names and eponyms than previous approaches.
Anthology ID:
2024.caldpseudo-1.9
Volume:
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Elena Volodina, David Alfter, Simon Dobnik, Therese Lindström Tiedemann, Ricardo Muñoz Sánchez, Maria Irena Szawerna, Xuan-Son Vu
Venues:
CALD-pseudo | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–80
Language:
URL:
https://aclanthology.org/2024.caldpseudo-1.9
DOI:
Bibkey:
Cite (ACL):
Thomas Vakili, Tyr Hullmann, Aron Henriksson, and Hercules Dalianis. 2024. When Is a Name Sensitive? Eponyms in Clinical Text and Implications for De-Identification. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024), pages 76–80, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
When Is a Name Sensitive? Eponyms in Clinical Text and Implications for De-Identification (Vakili et al., CALD-pseudo-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.caldpseudo-1.9.pdf
Video:
 https://aclanthology.org/2024.caldpseudo-1.9.mp4