2024
pdf
bib
abs
Identifying Fairness Issues in Automatically Generated Testing Content
Kevin Stowe
|
Benny Longwill
|
Alyssa Francis
|
Tatsuya Aoyama
|
Debanjan Ghosh
|
Swapna Somasundaran
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
Natural language generation tools are powerful and effective for generating content. However, language models are known to display bias and fairness issues, making them impractical to deploy for many use cases. We here focus on how fairness issues impact automatically generated test content, which can have stringent requirements to ensure the test measures only what it was intended to measure. Specifically, we review test content generated for a large-scale standardized English proficiency test with the goal of identifying content that only pertains to a certain subset of the test population as well as content that has the potential to be upsetting or distracting to some test takers. Issues like these could inadvertently impact a test taker’s score and thus should be avoided. This kind of content does not reflect the more commonly-acknowledged biases, making it challenging even for modern models that contain safeguards. We build a dataset of 601 generated texts annotated for fairness and explore a variety of methods for classification: fine-tuning, topic-based classification, and prompting, including few-shot and self-correcting prompts. We find that combining prompt self-correction and few-shot learning performs best, yielding an F1 score of 0.79 on our held-out test set, while much smaller BERT- and topic-based models have competitive performance on out-of-domain data.
pdf
bib
abs
LEAF: Language Learners’ English Essays and Feedback Corpus
Shabnam Behzad
|
Omid Kashefi
|
Swapna Somasundaran
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
This paper addresses the issue of automated feedback generation for English language learners by presenting a corpus of English essays and their corresponding feedback, called LEAF, collected from the “essayforum” website. The corpus comprises approximately 6K essay-feedback pairs, offering a diverse and valuable resource for developing personalized feedback generation systems that address the critical deficiencies within essays, spanning from rectifying grammatical errors to offering insights on argumentative aspects and organizational coherence. Using this corpus, we present and compare multiple feedback generation baselines. Our findings shed light on the challenges of providing personalized feedback and highlight the potential of the LEAF corpus in advancing automated essay evaluation.
pdf
bib
abs
Assessing Online Writing Feedback Resources: Generative AI vs. Good Samaritans
Shabnam Behzad
|
Omid Kashefi
|
Swapna Somasundaran
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Providing constructive feedback on student essays is a critical factor in improving educational results; however, it presents notable difficulties and may demand substantial time investments, especially when aiming to deliver individualized and informative guidance. This study undertakes a comparative analysis of two readily available online resources for students seeking to hone their skills in essay writing for English proficiency tests: 1) essayforum.com, a widely used platform where students can submit their essays and receive feedback from volunteer educators at no cost, and 2) Large Language Models (LLMs) such as ChatGPT. By contrasting the feedback obtained from these two resources, we posit that they can mutually reinforce each other and are more helpful if employed in conjunction when seeking no-cost online assistance. The findings of this research shed light on the challenges of providing personalized feedback and highlight the potential of AI in advancing the field of automated essay evaluation.
pdf
bib
abs
When Argumentation Meets Cohesion: Enhancing Automatic Feedback in Student Writing
Yuning Ding
|
Omid Kashefi
|
Swapna Somasundaran
|
Andrea Horbach
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In this paper, we investigate the role of arguments in the automatic scoring of cohesion in argumentative essays. The feature analysis reveals that in argumentative essays, the lexical cohesion between claims is more important to the overall cohesion, while the evidence is expected to be diverse and divergent. Our results show that combining features related to argument segments and cohesion features improves the performance of the automatic cohesion scoring model trained on a transformer. The cohesion score is also learned more accurately in a multi-task learning process by adding the automatic segmentation of argumentative elements as an auxiliary task. Our findings contribute to both the understanding of cohesion in argumentative writing and the development of automatic feedback.
2023
pdf
bib
abs
Argument Detection in Student Essays under Resource Constraints
Omid Kashefi
|
Sophia Chan
|
Swapna Somasundaran
Proceedings of the 10th Workshop on Argument Mining
Learning to make effective arguments is vital for the development of critical-thinking in students and, hence, for their academic and career success. Detecting argument components is crucial for developing systems that assess students’ ability to develop arguments. Traditionally, supervised learning has been used for this task, but this requires a large corpus of reliable training examples which are often impractical to obtain for student writing. Large language models have also been shown to be effective few-shot learners, making them suitable for low-resource argument detection. However, concerns such as latency, service reliability, and data privacy might hinder their practical applicability. To address these challenges, we present a low-resource classification approach that combines the intrinsic entailment relationship among the argument elements with a parameter-efficient prompt-tuning strategy. Experimental results demonstrate the effectiveness of our method in reducing the data and computation requirements of training an argument detection model without compromising the prediction accuracy. This suggests the practical applicability of our model across a variety of real-world settings, facilitating broader access to argument classification for researchers spanning various domains and problem scenarios.
2022
pdf
bib
abs
AGReE: A system for generating Automated Grammar Reading Exercises
Sophia Chan
|
Swapna Somasundaran
|
Debanjan Ghosh
|
Mengxuan Zhao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We describe the AGReE system, which takes user-submitted passages as input and automatically generates grammar practice exercises that can be completed while reading. Multiple-choice practice items are generated for a variety of different grammar constructs: punctuation, articles, conjunctions, pronouns, prepositions, verbs, and nouns. We also conducted a large-scale human evaluation with around 4,500 multiple-choice practice items. We notice for 95% of items, a majority of raters out of five were able to identify the correct answer, for 85% of cases, raters agree that there is only one correct answer among the choices. Finally, the error analysis shows that raters made the most mistakes for punctuation and conjunctions.
2021
pdf
bib
abs
Training and Domain Adaptation for Supervised Text Segmentation
Goran Glavaš
|
Ananya Ganesh
|
Swapna Somasundaran
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Unlike traditional unsupervised text segmentation methods, recent supervised segmentation models rely on Wikipedia as the source of large-scale segmentation supervision. These models have, however, predominantly been evaluated on the in-domain (Wikipedia-based) test sets, preventing conclusions about their general segmentation efficacy. In this work, we focus on the domain transfer performance of supervised neural text segmentation in the educational domain. To this end, we first introduce K12Seg, a new dataset for evaluation of supervised segmentation, created from educational reading material for grade-1 to college-level students. We then benchmark a hierarchical text segmentation model (HITS), based on RoBERTa, in both in-domain and domain-transfer segmentation experiments. While HITS produces state-of-the-art in-domain performance (on three Wikipedia-based test sets), we show that, subject to the standard full-blown fine-tuning, it is susceptible to domain overfitting. We identify adapter-based fine-tuning as a remedy that substantially improves transfer performance.
2020
pdf
bib
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)
Dmitry Ustalov
|
Swapna Somasundaran
|
Alexander Panchenko
|
Fragkiskos D. Malliaros
|
Ioana Hulpuș
|
Peter Jansen
|
Abhik Jana
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)
pdf
bib
abs
Emotion Arcs of Student Narratives
Swapna Somasundaran
|
Xianyang Chen
|
Michael Flor
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events
This paper studies emotion arcs in student narratives. We construct emotion arcs based on event affect and implied sentiments, which correspond to plot elements in the story. We show that student narratives can show elements of plot structure in their emotion arcs and that properties of these arcs can be useful indicators of narrative quality. We build a system and perform analysis to show that our arc-based features are complementary to previously studied sentiment features in this area.
2019
pdf
bib
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)
Dmitry Ustalov
|
Swapna Somasundaran
|
Peter Jansen
|
Goran Glavaš
|
Martin Riedl
|
Mihai Surdeanu
|
Michalis Vazirgiannis
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)
pdf
bib
abs
Lexical concreteness in narrative
Michael Flor
|
Swapna Somasundaran
Proceedings of the Second Workshop on Storytelling
This study explores the relation between lexical concreteness and narrative text quality. We present a methodology to quantitatively measure lexical concreteness of a text. We apply it to a corpus of student stories, scored according to writing evaluation rubrics. Lexical concreteness is weakly-to-moderately related to story quality, depending on story-type. The relation is mostly borne by adjectives and nouns, but also found for adverbs and verbs.
2018
pdf
bib
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)
Goran Glavaš
|
Swapna Somasundaran
|
Martin Riedl
|
Eduard Hovy
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)
pdf
bib
abs
Towards Evaluating Narrative Quality In Student Writing
Swapna Somasundaran
|
Michael Flor
|
Martin Chodorow
|
Hillary Molloy
|
Binod Gyawali
|
Laura McCulla
Transactions of the Association for Computational Linguistics, Volume 6
This work lays the foundation for automated assessments of narrative quality in student writing. We first manually score essays for narrative-relevant traits and sub-traits, and measure inter-annotator agreement. We then explore linguistic features that are indicative of good narrative writing and use them to build an automated scoring system. Experiments show that our features are more effective in scoring specific aspects of narrative quality than a state-of-the-art feature set.
2017
pdf
bib
abs
Sentiment Analysis and Lexical Cohesion for the Story Cloze Task
Michael Flor
|
Swapna Somasundaran
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics
We present two NLP components for the Story Cloze Task – dictionary-based sentiment analysis and lexical cohesion. While previous research found no contribution from sentiment analysis to the accuracy on this task, we demonstrate that sentiment is an important aspect. We describe a new approach, using a rule that estimates sentiment congruence in a story. Our sentiment-based system achieves strong results on this task. Our lexical cohesion system achieves accuracy comparable to previously published baseline results. A combination of the two systems achieves better accuracy than published baselines. We argue that sentiment analysis should be considered an integral part of narrative comprehension.
pdf
bib
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing
Martin Riedl
|
Swapna Somasundaran
|
Goran Glavaš
|
Eduard Hovy
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing
2016
pdf
bib
abs
Evaluating Argumentative and Narrative Essays using Graphs
Swapna Somasundaran
|
Brian Riordan
|
Binod Gyawali
|
Su-Youn Yoon
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
This work investigates whether the development of ideas in writing can be captured by graph properties derived from the text. Focusing on student essays, we represent the essay as a graph, and encode a variety of graph properties including PageRank as features for modeling essay scores related to quality of development. We demonstrate that our approach improves on a state-of-the-art system on the task of holistic scoring of persuasive essays and on the task of scoring narrative essays along the development dimension.
2015
pdf
bib
Automated Scoring of Picture-based Story Narration
Swapna Somasundaran
|
Chong Min Lee
|
Martin Chodorow
|
Xinhao Wang
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
bib
Scoring Persuasive Essays Using Opinions and their Targets
Noura Farra
|
Swapna Somasundaran
|
Jill Burstein
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications
2014
pdf
bib
Content Importance Models for Scoring Writing From Sources
Beata Beigman Klebanov
|
Nitin Madnani
|
Jill Burstein
|
Swapna Somasundaran
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
pdf
bib
Lexical Chaining for Measuring Discourse Coherence Quality in Test-taker Essays
Swapna Somasundaran
|
Jill Burstein
|
Martin Chodorow
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
pdf
bib
Automated Measures of Specific Vocabulary Knowledge from Constructed Responses (‘Use These Words to Write a Sentence Based on this Picture’)
Swapna Somasundaran
|
Martin Chodorow
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
bib
Finding your “Inner-Annotator”: An Experiment in Annotator Independence for Rating Discourse Coherence Quality in Essays
Jill Burstein
|
Swapna Somasundaran
|
Martin Chodorow
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop
2011
pdf
bib
A Combination of Topic Models with Max-margin Learning for Relation Detection
Dingcheng Li
|
Swapna Somasundaran
|
Amit Chakraborty
Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
2010
pdf
bib
Recognizing Stances in Ideological On-Line Debates
Swapna Somasundaran
|
Janyce Wiebe
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
pdf
bib
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing
Carmen Banea
|
Alessandro Moschitti
|
Swapna Somasundaran
|
Fabio Massimo Zanzotto
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing
2009
pdf
bib
Recognizing Stances in Online Debates
Swapna Somasundaran
|
Janyce Wiebe
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
pdf
bib
Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification
Swapna Somasundaran
|
Galileo Namata
|
Janyce Wiebe
|
Lise Getoor
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Opinion Graphs for Polarity and Discourse Classification
Swapna Somasundaran
|
Galileo Namata
|
Lise Getoor
|
Janyce Wiebe
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4)
2008
pdf
bib
abs
Finding the Sources and Targets of Subjective Expressions
Josef Ruppenhofer
|
Swapna Somasundaran
|
Janyce Wiebe
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
As many popular text genres such as blogs or news contain opinions by multiple sources and about multiple targets, finding the sources and targets of subjective expressions becomes an important sub-task for automatic opinion analysis systems. We argue that while automatic semantic role labeling systems (ASRL) have an important contribution to make, they cannot solve the problem for all cases. Based on the experience of manually annotating opinions, sources, and targets in various genres, we present linguistic phenomena that require knowledge beyond that of ASRL systems. In particular, we address issues relating to the attribution of opinions to sources; sources and targets that are realized as zero-forms; and inferred opinions. We also discuss in some depth that for arguing attitudes we need to be able to recover propositions and not only argued-about entities. A recurrent theme of the discussion is that close attention to specific discourse contexts is needed to identify sources and targets correctly.
pdf
bib
Discourse Level Opinion Relations: An Annotation Study
Swapna Somasundaran
|
Josef Ruppenhofer
|
Janyce Wiebe
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
pdf
bib
Discourse Level Opinion Interpretation
Swapna Somasundaran
|
Janyce Wiebe
|
Josef Ruppenhofer
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)
2007
pdf
bib
Detecting Arguing and Sentiment in Meetings
Swapna Somasundaran
|
Josef Ruppenhofer
|
Janyce Wiebe
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue
2006
pdf
bib
Manual Annotation of Opinion Categories in Meetings
Swapna Somasundaran
|
Janyce Wiebe
|
Paul Hoffmann
|
Diane Litman
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
2005
pdf
bib
OpinionFinder: A System for Subjectivity Analysis
Theresa Wilson
|
Paul Hoffmann
|
Swapna Somasundaran
|
Jason Kessler
|
Janyce Wiebe
|
Yejin Choi
|
Claire Cardie
|
Ellen Riloff
|
Siddharth Patwardhan
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations