Ozlem Uzuner

Also published as: Özlem Uzuner


2024

pdf bib
MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection
Sadiya Sayara Chowdhury Puspo | Nishat Raihan | Dhiman Goswami | Al Nahian Bin Emran | Amrita Ganguly | Özlem Uzuner
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper presents the MasonTigers entryto the SemEval-2024 Task 8 - Multigenerator, Multidomain, and Multilingual BlackBox Machine-Generated Text Detection. Thetask encompasses Binary Human-Written vs.Machine-Generated Text Classification (TrackA), Multi-Way Machine-Generated Text Classification (Track B), and Human-Machine MixedText Detection (Track C). Our best performing approaches utilize mainly the ensemble ofdiscriminator transformer models along withsentence transformer and statistical machinelearning approaches in specific cases. Moreover, Zero shot prompting and fine-tuning ofFLAN-T5 are used for Track A and B.

pdf bib
A Novel Corpus of Annotated Medical Imaging Reports and Information Extraction Results Using BERT-based Language Models
Namu Park | Kevin Lybarger | Giridhar Kaushik Ramachandran | Spencer Lewis | Aashka Damani | Özlem Uzuner | Martin Gunn | Meliha Yetisgen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Medical imaging is critical to the diagnosis, surveillance, and treatment of many health conditions, including oncological, neurological, cardiovascular, and musculoskeletal disorders, among others. Radiologists interpret these complex, unstructured images and articulate their assessments through narrative reports that remain largely unstructured. This unstructured narrative must be converted into a structured semantic representation to facilitate secondary applications such as retrospective analyses or clinical decision support. Here, we introduce the Corpus of Annotated Medical Imaging Reports (CAMIR), which includes 609 annotated radiology reports from three imaging modality types: Computed Tomography, Magnetic Resonance Imaging, and Positron Emission Tomography-Computed Tomography. Reports were annotated using an event-based schema that captures clinical indications, lesions, and medical problems. Each event consists of a trigger and multiple arguments, and a majority of the argument types, including anatomy, normalize the spans to pre-defined concepts to facilitate secondary use. CAMIR uniquely combines a granular event structure and concept normalization. To extract CAMIR events, we explored two BERT (Bi-directional Encoder Representation from Transformers)-based architectures, including an existing architecture (mSpERT) that jointly extracts all event information and a multi-step approach (PL-Marker++) that we augmented for the CAMIR schema.

pdf bib
Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods
Yujuan Fu | Giridhar Kaushik Ramachandran | Nicholas J. Dobbins | Namu Park | Michael Leu | Abby R. Rosenberg | Kevin Lybarger | Fei Xia | Özlem Uzuner | Meliha Yetisgen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education access, substance use history, and mental health with an overall annotator agreement of 81.9 F1. Our proposed fine-tuning LLM-based extractors achieve high performance at 78.4 F1 for event arguments. In-context learning approaches with GPT-4 demonstrate promise for reliable SDoH extraction with limited annotated examples, with extraction performance at 82.3 F1 for event triggers.

pdf bib
Enhancing Consumer Health Question Reformulation: Chain-of-Thought Prompting Integrating Focus, Type, and User Knowledge Level
Jooyeon Lee | Luan Huy Pham | Özlem Uzuner
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

In this paper, we explore consumer health question (CHQ) reformulation, focusing on enhancing the quality of reformation of questions without considering interest shifts. Our study introduces the use of the NIH GARD website as a gold standard dataset for this specific task, emphasizing its relevance and applicability. Additionally, we developed other datasets consisting of related questions scraped from Google, Bing, and Yahoo. We augmented, evaluated and analyzed the various datasets, demonstrating that the reformulation task closely resembles the question entailment generation task. Our approach, which integrates the Focus and Type of consumer inquiries, represents a significant advancement in the field of question reformulation. We provide a comprehensive analysis of different methodologies, offering insights into the development of more effective and user-centric AI systems for consumer health support.

2023

pdf bib
M1437 at BLP-2023 Task 2: Harnessing Bangla Text for Sentiment Analysis: A Transformer-based Approach
Majidur Rahman | Ozlem Uzuner
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

Analyzing public sentiment on social media is helpful in understanding the public’s emotions about any given topic. While numerous studies have been conducted in this field, there has been limited research on Bangla social media data. Team M1437 from George Mason University participated in the Sentiment Analysis shared task of the Bangla Language Processing (BLP) Workshop at EMNLP-2023. The team fine-tuned various BERT-based Transformer architectures to solve the task. This article shows that BanglaBERTlarge, a language model pre-trained on Bangla text, outperformed other BERT-based models. This model achieved an F1 score of 73.15% and top position in the development phase, was further tuned with external training data, and achieved an F1 score of 70.36% in the evaluation phase, securing the fourteenth place on the leaderboard. The F1 score on the test set, when BanglaBERTlarge was trained without external training data, was 71.54%.

pdf bib
MasonNLP+ at SemEval-2023 Task 8: Extracting Medical Questions, Experiences and Claims from Social Media using Knowledge-Augmented Pre-trained Language Models
Giridhar Kaushik Ramachandran | Haritha Gangavarapu | Kevin Lybarger | Ozlem Uzuner
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In online forums like Reddit, users share their experiences with medical conditions and treatments, including making claims, asking questions, and discussing the effects of treatments on their health. Building systems to understand this information can effectively monitor the spread of misinformation and verify user claims. The Task-8 of the 2023 International Workshop on Semantic Evaluation focused on medical applications, specifically extracting patient experience- and medical condition-related entities from user posts on social media. The Reddit Health Online Talk (RedHot) corpus contains posts from medical condition-related subreddits with annotations characterizing the patient experience and medical conditions. In Subtask-1, patient experience is characterized by personal experience, questions, and claims. In Subtask-2, medical conditions are characterized by population, intervention, and outcome. For the automatic extraction of patient experiences and medical condition information, as a part of the challenge, we proposed language-model-based extraction systems that ranked $3ˆ{rd}$ on both subtasks’ leaderboards. In this work, we describe our approach and, in addition, explore the automatic extraction of this information using domain-specific language models and the inclusion of external knowledge.

pdf bib
Prompt-based Extraction of Social Determinants of Health Using Few-shot Learning
Giridhar Kaushik Ramachandran | Yujuan Fu | Bin Han | Kevin Lybarger | Nic Dobbins | Ozlem Uzuner | Meliha Yetisgen
Proceedings of the 5th Clinical Natural Language Processing Workshop

Social determinants of health (SDOH) documented in the electronic health record through unstructured text are increasingly being studied to understand how SDOH impacts patient health outcomes. In this work, we utilize the Social History Annotation Corpus (SHAC), a multi-institutional corpus of de-identified social history sections annotated for SDOH, including substance use, employment, and living status information. We explore the automatic extraction of SDOH information with SHAC in both standoff and inline annotation formats using GPT-4 in a one-shot prompting setting. We compare GPT-4 extraction performance with a high-performing supervised approach and perform thorough error analyses. Our prompt-based GPT-4 method achieved an overall 0.652 F1 on the SHAC test set, similar to the 7th best-performing system among all teams in the n2c2 challenge with SHAC.

2022

pdf bib
MNLP at FinCausal2022: Nested NER with a Generative Model
Jooyeon Lee | Luan Huy Pham | Özlem Uzuner
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022

This paper describes work performed for the FinCasual 2022 Shared Task “Financial Document Causality Detection” (FinCausal 2022). As the name implies, the task involves extraction of casual and consequential elements from financial text. Our approach focuses employing Nested NER using the Text-to-Text Transformer (T5) generative transformer models while applying different combinations of datasets and tagging methods. Our system reports accuracy of 79% in Exact Match comparison and F-measure score of 92% token level measurement.

2021

pdf bib
SemEval-2021 Task 10: Source-Free Domain Adaptation for Semantic Processing
Egoitz Laparra | Xin Su | Yiyun Zhao | Özlem Uzuner | Timothy Miller | Steven Bethard
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents the Source-Free Domain Adaptation shared task held within SemEval-2021. The aim of the task was to explore adaptation of machine-learning models in the face of data sharing constraints. Specifically, we consider the scenario where annotations exist for a domain but cannot be shared. Instead, participants are provided with models trained on that (source) data. Participants also receive some labeled data from a new (development) domain on which to explore domain adaptation algorithms. Participants are then tested on data representing a new (target) domain. We explored this scenario with two different semantic tasks: negation detection (a text classification task) and time expression recognition (a sequence tagging task).

pdf bib
Leveraging Offensive Language for Sarcasm and Sentiment Detection in Arabic
Fatemah Husain | Ozlem Uzuner
Proceedings of the Sixth Arabic Natural Language Processing Workshop

Sarcasm detection is one of the top challenging tasks in text classification, particularly for informal Arabic with high syntactic and semantic ambiguity. We propose two systems that harness knowledge from multiple tasks to improve the performance of the classifier. This paper presents the systems used in our participation to the two sub-tasks of the Sixth Arabic Natural Language Processing Workshop (WANLP); Sarcasm Detection and Sentiment Analysis. Our methodology is driven by the hypothesis that tweets with negative sentiment and tweets with sarcasm content are more likely to have offensive content, thus, fine-tuning the classification model using large corpus of offensive language, supports the learning process of the model to effectively detect sentiment and sarcasm contents. Results demonstrate the effectiveness of our approach for sarcasm detection task over sentiment analysis task.

pdf bib
MNLP at MEDIQA 2021: Fine-Tuning PEGASUS for Consumer Health Question Summarization
Jooyeon Lee | Huong Dang | Ozlem Uzuner | Sam Henry
Proceedings of the 20th Workshop on Biomedical Language Processing

This paper details a Consumer Health Question (CHQ) summarization model submitted to MEDIQA 2021 for shared task 1: Question Summarization. Many CHQs are composed of multiple sentences with typos or unnecessary information, which can interfere with automated question answering systems. Question summarization mitigates this issue by removing this unnecessary information, aiding automated systems in generating a more accurate summary. Our summarization approach focuses on applying multiple pre-processing techniques, including question focus identification on the input and the development of an ensemble method to combine question focus with an abstractive summarization method. We use the state-of-art abstractive summarization model, PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization), to generate abstractive summaries. Our experiments show that using our ensemble method, which combines abstractive summarization with question focus identification, improves performance over using summarization alone. Our model shows a ROUGE-2 F-measure of 11.14% against the official test dataset.

2020

pdf bib
Ensemble BERT for Classifying Medication-mentioning Tweets
Huong Dang | Kahyun Lee | Sam Henry | Özlem Uzuner
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

Twitter is a valuable source of patient-generated data that has been used in various population health studies. The first step in many of these studies is to identify and capture Twitter messages (tweets) containing medication mentions. In this article, we describe our submission to Task 1 of the Social Media Mining for Health Applications (SMM4H) Shared Task 2020. This task challenged participants to detect tweets that mention medications or dietary supplements in a natural, highly imbalance dataset. Our system combined a handcrafted preprocessing step with an ensemble of 20 BERT-based classifiers generated by dividing the training dataset into subsets using 10-fold cross validation and exploiting two BERT embedding models. Our system ranked first in this task, and improved the average F1 score across all participating teams by 19.07% with a precision, recall, and F1 on the test set of 83.75%, 87.01%, and 85.35% respectively.

pdf bib
SalamNET at SemEval-2020 Task 12: Deep Learning Approach for Arabic Offensive Language Detection
Fatemah Husain | Jooyeon Lee | Sam Henry | Ozlem Uzuner
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes SalamNET, an Arabic offensive language detection system that has been submitted to SemEval 2020 shared task 12: Multilingual Offensive Language Identification in Social Media. Our approach focuses on applying multiple deep learning models and conducting in depth error analysis of results to provide system implications for future development considerations. To pursue our goal, a Recurrent Neural Network (RNN), a Gated Recurrent Unit (GRU), and Long-Short Term Memory (LSTM) models with different design architectures have been developed and evaluated. The SalamNET, a Bi-directional Gated Recurrent Unit (Bi-GRU) based model, reports a macro-F1 score of 0.83%

2019

pdf bib
CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts
Ayah Zirikly | Philip Resnik | Özlem Uzuner | Kristy Hollingshead
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

The shared task for the 2019 Workshop on Computational Linguistics and Clinical Psychology (CLPsych’19) introduced an assessment of suicide risk based on social media postings, using data from Reddit to identify users at no, low, moderate, or severe risk. Two variations of the task focused on users whose posts to the r/SuicideWatch subreddit indicated they might be at risk; a third task looked at screening users based only on their more everyday (non-SuicideWatch) posts. We received submissions from 15 different teams, and the results provide progress and insight into the value of language signal in helping to predict risk level.

pdf bib
Deep Learning for Identification of Adverse Effect Mentions In Twitter Data
Paul Barry | Ozlem Uzuner
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

Social Media Mining for Health Applications (SMM4H) Adverse Effect Mentions Shared Task challenges participants to accurately identify spans of text within a tweet that correspond to Adverse Effects (AEs) resulting from medication usage (Weissenbacher et al., 2019). This task features a training data set of 2,367 tweets, in addition to a 1,000 tweet evaluation data set. The solution presented here features a bidirectional Long Short-term Memory Network (bi-LSTM) for the generation of character-level embeddings. It uses a second bi-LSTM trained on both character and token level embeddings to feed a Conditional Random Field (CRF) which provides the final classification. This paper further discusses the deep learning algorithms used in our solution.

2018

pdf bib
Enhancing Cohesion and Coherence of Fake Text to Improve Believability for Deceiving Cyber Attackers
Prakruthi Karuna | Hemant Purohit | Özlem Uzuner | Sushil Jajodia | Rajesh Ganesan
Proceedings of the First International Workshop on Language Cognition and Computational Models

Ever increasing ransomware attacks and thefts of intellectual property demand cybersecurity solutions to protect critical documents. One emerging solution is to place fake text documents in the repository of critical documents for deceiving and catching cyber attackers. We can generate fake text documents by obscuring the salient information in legit text documents. However, the obscuring process can result in linguistic inconsistencies, such as broken co-references and illogical flow of ideas across the sentences, which can discern the fake document and render it unbelievable. In this paper, we propose a novel method to generate believable fake text documents by automatically improving the linguistic consistency of computer-generated fake text. Our method focuses on enhancing syntactic cohesion and semantic coherence across discourse segments. We conduct experiments with human subjects to evaluate the effect of believability improvements in distinguishing legit texts from fake texts. Results show that the probability to distinguish legit texts from believable fake texts is consistently lower than from fake texts that have not been improved in believability. This indicates the effectiveness of our method in generating believable fake text.

2016

pdf bib
Feature-Augmented Neural Networks for Patient Note De-identification
Ji Young Lee | Franck Dernoncourt | Özlem Uzuner | Peter Szolovits
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Patient notes contain a wealth of information of potentially great interest to medical investigators. However, to protect patients’ privacy, Protected Health Information (PHI) must be removed from the patient notes before they can be legally released, a process known as patient note de-identification. The main objective for a de-identification system is to have the highest possible recall. Recently, the first neural-network-based de-identification system has been proposed, yielding state-of-the-art results. Unlike other systems, it does not rely on human-engineered features, which allows it to be quickly deployed, but does not leverage knowledge from human experts or from electronic health records (EHRs). In this work, we explore a method to incorporate human-engineered features as well as features derived from EHRs to a neural-network-based de-identification system. Our results show that the addition of features, especially the EHR-derived features, further improves the state-of-the-art in patient note de-identification, including for some of the most sensitive PHI types such as patient names. Since in a real-life setting patient notes typically come with EHRs, we recommend developers of de-identification systems to leverage the information EHRs contain.

2014

pdf bib
Biomedical/Clinical NLP
Ozlem Uzuner | Meliha Yetişgen | Amber Stubbs
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts

2010

pdf bib
Extracting Medication Information from Discharge Summaries
Scott Halgrim | Fei Xia | Imre Solti | Eithon Cadag | Özlem Uzuner
Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents

pdf bib
Does negation really matter?
Ira Goldstein | Özlem Uzuner
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

2006

pdf bib
Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text
Tawanda Sibanda | Ozlem Uzuner | Ozlem Uzuner
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text
Tawanda Sibanda | Ozlem Uzuner | Ozlem Uzuner
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
Using Syntactic Information to Identify Plagiarism
Özlem Uzuner | Boris Katz | Thade Nahnsen
Proceedings of the Second Workshop on Building Educational Applications Using NLP

pdf bib
A Comparative Study of Language Models for Book and Author Recognition
Özlem Uzuner | Boris Katz
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection
Thade Nahnsen | Özlem Uzuner | Boris Katz
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts