Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback

Chengfeng Dou, Ying Zhang, Zhi Jin, Wenpin Jiao, Haiyan Zhao, Yongqiang Zhao, Zhengwei Tao


Abstract
The utilization of large language models for medical dialogue generation has attracted considerable attention due to its potential to enhance response richness and coherence. While previous studies have made strides in optimizing model performance, there is a pressing need to bolster the model’s capacity for diagnostic logic to ensure patient safety. In response to this need, we propose an approach termed preference learning from process feedback (PLPF), which involves integrating the doctor’s diagnostic logic into LLMs. PLPF encompasses three key components: rule modeling, preference data generation, and preference alignment. These components collectively serve to train the model to adhere to the diagnostic process. Our experimental results, utilizing Standardized Patient Testing, demonstrate that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%, surpassing the performance of traditional approaches. Moreover, PLPF exhibits effectiveness in both multi-round and single-round dialogue tasks, thereby highlighting its potential in improving medical dialogue generation. Our dataset is available at https://github.com/Chengfeng-Dou/SpTesting.
Anthology ID:
2024.findings-acl.144
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2453–2473
Language:
URL:
https://aclanthology.org/2024.findings-acl.144
DOI:
10.18653/v1/2024.findings-acl.144
Bibkey:
Cite (ACL):
Chengfeng Dou, Ying Zhang, Zhi Jin, Wenpin Jiao, Haiyan Zhao, Yongqiang Zhao, and Zhengwei Tao. 2024. Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback. In Findings of the Association for Computational Linguistics ACL 2024, pages 2453–2473, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback (Dou et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.144.pdf