MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain

Chao Jiang, Wei Xu


Abstract
Medical texts are notoriously challenging to read. Properly measuring their readability is the first step towards making them more accessible. Here, we present the first systematic study on fine-grained readability measurements in the medical domain, at both sentence-level and span-level. We first introduce a new dataset MedReadMe, which consists of manually annotated readability ratings and fine-grained complex span annotation for 4,520 sentences, featuring two novel “Google-Easy” and “Google-Hard” categories. It supports our quantitative analysis, which covers 650 linguistic features and additional complex span features, to answer “why medical sentences are so hard.” Enabled by our high-quality annotation, we benchmark several state-of-the-art sentence-level readability metrics, including unsupervised, supervised, and prompting-based methods using recently developed large language models (LLMs). Informed by our fine-grained complex span annotation, we find that adding a single feature, capturing the number of jargon spans, into existing readability formulas can significantly improve their correlation with human judgments, and also make them more stable. We will publicly release data and code.
Anthology ID:
2024.emnlp-main.958
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17293–17319
Language:
URL:
https://aclanthology.org/2024.emnlp-main.958/
DOI:
10.18653/v1/2024.emnlp-main.958
Bibkey:
Cite (ACL):
Chao Jiang and Wei Xu. 2024. MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17293–17319, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain (Jiang & Xu, EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.958.pdf