Ayush Singh


2024

pdf bib
LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications
Saranya Krishnamoorthy | Ayush Singh | Shabnam Tafreshi
Proceedings of the 6th Clinical Natural Language Processing Workshop

Electronic health records (EHR) even though a boon for healthcare practitioners, are grow- ing convoluted and longer every day. Sifting around these lengthy EHRs is taxing and be- comes a cumbersome part of physician-patient interaction. Several approaches have been pro- posed to help alleviate this prevalent issue ei- ther via summarization or sectioning, however, only a few approaches have truly been helpful in the past. With the rise of automated methods, machine learning (ML) has shown promise in solving the task of identifying relevant sections in EHR. However, most ML methods rely on labeled data which is difficult to get in health- care. Large language models (LLMs) on the other hand, have performed impressive feats in natural language processing (NLP), that too in a zero-shot manner, i.e. without any labeled data. To that end, we propose using LLMs to identify relevant section headers. We find that GPT-4 can effectively solve the task on both zero and few-shot settings as well as segment dramatically better than state-of-the-art meth- ods. Additionally, we also annotate a much harder real world dataset and find that GPT-4 struggles to perform well, alluding to further research and harder benchmarks.

pdf bib
Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension
Shubham Vatsal | Ayush Singh
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

Large language models (LLMs) have shown remarkable performance on many tasks in different domains. However, their performance in contextual biomedical machine reading comprehension (MRC) has not been evaluated in depth. In this work, we evaluate GPT on four contextual biomedical MRC benchmarks. We experiment with different conventional prompting techniques as well as introduce our own novel prompting method. To solve some of the retrieval problems inherent to LLMs, we propose a prompting strategy named Implicit Retrieval Augmented Generation (RAG) that alleviates the need for using vector databases to retrieve important chunks in traditional RAG setups. Moreover, we report qualitative assessments on the natural language generation outputs from our approach. The results show that our new prompting technique is able to get the best performance in two out of four datasets and ranks second in rest of them. Experiments show that modern-day LLMs like GPT even in a zero-shot setting can outperform supervised models, leading to new state-of-the-art (SoTA) results on two of the benchmarks.

pdf bib
Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
Pranab Sahoo | Ayush Singh | Sriparna Saha | Aman Chadha | Samrat Mondal
Findings of the Association for Computational Linguistics ACL 2024

The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponential increase in data sources like social media content, biomedical literature, and Electronic Medical Records (EMR), extracting relevant ADE-related information from these unstructured texts is imperative. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids. Additionally, we introduce a framework that leverages the capabilities of LLMs and VLMs for ADE detection by generating detailed descriptions of medical images depicting ADEs, aiding healthcare professionals in visually identifying adverse events. Using our MMADE dataset, we showcase the significance of integrating visual cues from images to enhance overall performance. This approach holds promise for patient safety, ADE awareness, and healthcare accessibility, paving the way for further exploration in personalized healthcare.

2023

pdf bib
Taxonomy-Based Automation of Prior Approval Using Clinical Guidelines
Saranya Krishnamoorthy | Ayush Singh
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Performing prior authorization on patients in a medical facility is a time-consuming and challenging task for insurance companies. Automating the clinical decisions that lead to authorization can reduce the time that staff spend executing such procedures. To better facilitate such critical decision making, we present an automated approach to predict one of the challenging tasks in the process called primary clinical indicator prediction, which is the outcome of this procedure. The proposed solution is to create a taxonomy to capture the main categories in primary clinical indicators. Our approach involves an important step of selecting what is known as the “primary indicator” – one of the several heuristics based on clinical guidelines that are published and publicly available. A taxonomy based PI classification system was created to help in the recognition of PIs from free text in electronic health records (EHRs). This taxonomy includes comprehensive explanations of each PI, as well as examples of free text that could be used to detect each PI. The major contribution of this work is to introduce a taxonomy created by three professional nurses with many years of experience. We experiment with several state-of-the-art supervised and unsupervised techniques with a focus on prior approval for spinal imaging. The results indicate that the proposed taxonomy is capable of increasing the performance of unsupervised approaches by up to 10 F1 points. Further, in the supervised setting, we achieve an F1 score of 0.61 using a conventional technique based on term frequency–inverse document frequency that outperforms other deep-learning approaches.

2022

pdf bib
CLPT: A Universal Annotation Scheme and Toolkit for Clinical Language Processing
Saranya Krishnamoorthy | Yanyi Jiang | William Buchanan | Ayush Singh | John Ortega
Proceedings of the 4th Clinical Natural Language Processing Workshop

With the abundance of natural language processing (NLP) frameworks and toolkits being used in the clinical arena, a new challenge has arisen - how do technologists collaborate across several projects in an easy way? Private sector companies are usually not willing to share their work due to intellectual property rights and profit-bearing decisions. Therefore, the annotation schemes and toolkits that they use are rarely shared with the wider community. We present the clinical language pipeline toolkit (CLPT) and its corresponding annotation scheme called the CLAO (Clinical Language Annotation Object) with the aim of creating a way to share research results and other efforts through a software solution. The CLAO is a unified annotation scheme for clinical technology processing (CTP) projects that forms part of the CLPT and is more reliable than previous standards such as UIMA, BioC, and cTakes for annotation searches, insertions, and deletions. Additionally, it offers a standardized object that can be exchanged through an API that the authors release publicly for CTP project inclusion.