Ben Wellner

Also published as: Benjamin Wellner


2019

pdf bib
MITRE at SemEval-2019 Task 5: Transfer Learning for Multilingual Hate Speech Detection
Abigail Gertner | John Henderson | Elizabeth Merkhofer | Amy Marsh | Ben Wellner | Guido Zarrella
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes MITRE’s participation in SemEval-2019 Task 5, HatEval: Multilingual detection of hate speech against immigrants and women in Twitter. The techniques explored range from simple bag-of-ngrams classifiers to neural architectures with varied attention mechanisms. We describe several styles of transfer learning from auxiliary tasks, including a novel method for adapting pre-trained BERT models to Twitter data. Logistic regression ties the systems together into an ensemble submitted for evaluation. The resulting system was used to produce predictions for all four HatEval subtasks, achieving the best mean rank of all teams that participated in all four conditions.

2009

pdf bib
Sources of Performance in CRF Transfer Training: a Business Name-tagging Case Study
Marc Vilain | Jonathan Huggins | Ben Wellner
Proceedings of the International Conference RANLP-2009

pdf bib
A simple feature-copying approach for long-distance dependencies
Marc Vilain | Jonathan Huggins | Ben Wellner
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

2008

pdf bib
SpatialML: Annotation Scheme, Corpora, and Tools
Inderjeet Mani | Janet Hitzeman | Justin Richer | Dave Harris | Rob Quimby | Ben Wellner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

SpatialML is an annotation scheme for marking up references to places in natural language. It covers both named and nominal references to places, grounding them where possible with geo-coordinates, including both relative and absolute locations, and characterizes relationships among places in terms of a region calculus. A freely available annotation editor has been developed for SpatialML, along with a corpus of annotated documents released by the Linguistic Data Consortium. Inter-annotator agreement on SpatialML is 77.0 F-measure for extents on that corpus. An automatic tagger for SpatialML extents scores 78.5 F-measure. A disambiguator scores 93.0 F-measure and 93.4 Predictive Accuracy. In adapting the extent tagger to new domains, merging the training data from the above corpus with annotated data in the new domain provides the best performance.

2007

pdf bib
Automatically Identifying the Arguments of Discourse Connectives
Ben Wellner | James Pustejovsky
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Maytag: A Multi-Staged Approach to Identifying Complex Events in Textual Data
Conrad Chang | Lisa Ferro | John Gibson | Janet Hitzeman | Suzi Lubar | Justin Palmer | Sean Munson | Marc Vilain | Benjamin Wellner
Demonstrations

pdf bib
A Pilot Study on Acquiring Metric Temporal Constraints for Events
Inderjeet Mani | Ben Wellner
Proceedings of the Workshop on Annotating and Reasoning about Time and Events

pdf bib
Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources
Ben Wellner | James Pustejovsky | Catherine Havasi | Anna Rumshisky | Roser Saurí
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

pdf bib
Leveraging Machine Readable Dictionaries in Discriminative Sequence Models
Ben Wellner | Marc Vilain
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Many natural language processing tasks make use of a lexicon – typically the words collected from some annotated training data along with their associated properties. We demonstrate here the utility of corpora-independent lexicons derived from machine readable dictionaries. Lexical information is encoded in the form of features in a Conditional Random Field tagger providing improved performance in cases where: i) limited training data is made available ii) the data is case-less and iii) the test data genre or domain is different than that of the training data. We show substantial error reductions, especially on unknown words, for the tasks of part-of-speech tagging and shallow parsing, achieving up to 20% error reduction on Penn TreeBank part-of-speech tagging and up to a 15.7% error reduction for shallow parsing using the CoNLL 2000 data. Our results here point towards a simple, but effective methodology for increasing the adaptability of text processing systems by training models with annotated data in one genre augmented with general lexical information or lexical information pertinent to the target genre (or domain).

pdf bib
Machine Learning of Temporal Relations
Inderjeet Mani | Marc Verhagen | Ben Wellner | Chong Min Lee | James Pustejovsky
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
Weakly Supervised Learning Methods for Improving the Quality of Gene Name Normalization Data
Ben Wellner
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

pdf bib
Adaptive String Similarity Metrics for Biomedical Reference Resolution
Ben Wellner | José Castaño | James Pustejovsky
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

2004

pdf bib
The MITRE logical form generation system
Samuel Bayer | John Burger | John Greiff | Ben Wellner
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2001

pdf bib
Integrated Feasibility Experiment for Bio-Security: IFE-Bio, A TIDES Demonstration
Lynette Hirschman | Kris Concepcion | Laurie Damianos | David Day | John Delmore | Lisa Ferro | John Griffith | John Henderson | Jeff Kurtz | Inderjeet Mani | Scott Mardis | Tom McEntee | Keith Miller | Beverly Nunam | Jay Ponte | Florence Reeder | Ben Wellner | George Wilson | Alex Yeh
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Considering Automatic Aids to Corpus Annotation
David Day | Benjamin Wellner
Proceedings of the COLING-2000 Workshop on Linguistically Interpreted Corpora