Wenqi Shi
2024
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models
Ran Xu
|
Hejie Cui
|
Yue Yu
|
Xuan Kan
|
Wenqi Shi
|
Yuchen Zhuang
|
May Dongmei Wang
|
Wei Jin
|
Joyce Ho
|
Carl Yang
Findings of the Association for Computational Linguistics ACL 2024
Clinical natural language processing faces challenges like complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation with LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 8 clinical NLP tasks and 18 datasets reveals that ClinGen consistently enhances performance across various tasks by 7.7%-8.7% on average, effectively aligning the distribution of real datasets and enriching the diversity of generated training instances.
RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
Ran Xu
|
Wenqi Shi
|
Yue Yu
|
Yuchen Zhuang
|
Bowen Jin
|
May Dongmei Wang
|
Joyce Ho
|
Carl Yang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks.