Seeing Is Believing! towards Knowledge-Infused Multi-modal Medical Dialogue Generation

Abhisek Tiwari, Shreyangshu Bera, Preeti Verma, Jaithra Varma Manthena, Sriparna Saha, Pushpak Bhattacharyya, Minakshi Dhar, Sarbajeet Tiwari


Abstract
Over the last few years, artificial intelligence-based clinical assistance has gained immense popularity and demand in telemedicine, including automatic disease diagnosis. Patients often describe their signs and symptoms to doctors using visual aids, which provide vital evidence for identifying a medical condition. In addition to learning from our experiences, we learn from well-established theories/ knowledge. With the motivation of leveraging visual cues and medical knowledge, we propose a transformer-based, knowledge-infused multi-modal medical dialogue generation (KI-MMDG) framework. In addition, we present a discourse-aware image identifier (DII) that recognizes signs and their severity by leveraging the current conversation context in addition to the image of the signs. We first curate an empathy and severity-aware multi-modal medical dialogue (ES-MMD) corpus in English, which is annotated with intent, symptoms, and visual signs with severity information. Experimental results show the superior performance of the proposed KI-MMDG model over uni-modal and non-knowledge infused generative models, demonstrating the importance of visual signs and knowledge infusion in symptom investigation and diagnosis. We also observed that the DII model surpasses the existing state-of-the-art model by 7.84%, indicating the crucial significance of dialogue context for identifying a sign image surfaced during conversations. The code and dataset are available at https://github.com/NLP-RL/KI-MMDG.
Anthology ID:
2024.lrec-main.1264
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
14513–14523
Language:
URL:
https://aclanthology.org/2024.lrec-main.1264
DOI:
Bibkey:
Cite (ACL):
Abhisek Tiwari, Shreyangshu Bera, Preeti Verma, Jaithra Varma Manthena, Sriparna Saha, Pushpak Bhattacharyya, Minakshi Dhar, and Sarbajeet Tiwari. 2024. Seeing Is Believing! towards Knowledge-Infused Multi-modal Medical Dialogue Generation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14513–14523, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Seeing Is Believing! towards Knowledge-Infused Multi-modal Medical Dialogue Generation (Tiwari et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1264.pdf