2014
pdf
bib
abs
ClearTK 2.0: Design Patterns for Machine Learning in UIMA
Steven Bethard
|
Philip Ogren
|
Lee Becker
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has followed a number of important design principles including: conceptually simple annotator interfaces, readable pipeline descriptions, minimal collection readers, type system agnostic code, modules organized for ease of import, and assisting user comprehension of the complex UIMA framework.
2010
pdf
bib
Improving Syntactic Coordination Resolution using Language Modeling
Philip Ogren
Proceedings of the NAACL HLT 2010 Student Research Workshop
2009
pdf
bib
High-precision biological event extraction with a concept recognizer
K. Bretonnel Cohen
|
Karin Verspoor
|
Helen Johnson
|
Chris Roeder
|
Philip Ogren
|
William Baumgartner
|
Elizabeth White
|
Lawrence Hunter
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task
pdf
bib
Building Test Suites for UIMA Components
Philip Ogren
|
Steven Bethard
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)
2008
pdf
bib
abs
System Evaluation on a Named Entity Corpus from Clinical Notes
Karin Schuler
|
Vinod Kaggal
|
James Masanz
|
Philip Ogren
|
Guergana Savova
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper presents the evaluation of the dictionary look-up component of Mayo Clinics Information Extraction system. The component was tested on a corpus of 160 free-text clinical notes which were manually annotated with the named entity disease. This kind of clinical text presents many language challenges such as fragmented sentences and heavy use of abbreviations and acronyms. The dictionary used for this evaluation was a subset of SNOMED-CT with semantic types corresponding to diseases/disorders without any augmentation. The algorithm achieves an F-score of 0.56 for exact matches and F-scores of 0.76 and 0.62 for right and left-partial matches respectively. Machine learning techniques are currently under investigation to improve this task.
pdf
bib
abs
Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition
Philip Ogren
|
Guergana Savova
|
Christopher Chute
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
We report on the construction of a gold-standard dataset consisting of annotated clinical notes suitable for evaluating our biomedical named entity recognition system. The dataset is the result of consensus between four human annotators and contains 1,556 annotations on 160 clinical notes using 658 unique concept codes from SNOMED-CT corresponding to human disorders. Inter-annotator agreement was calculated on annotations from 100 of the documents for span (90.9%), concept code (81.7%), context (84.8%), and status (86.0%) agreement. Complete agreement for span, concept code, context, and status was 74.6%. We found that creating a consensus set based on annotations from two independently-created annotation sets can reduce inter-annotator disagreement by 32.3%. We found little benefit to pre-annotating the corpus with a third-party named entity recognizer.
2006
pdf
bib
Knowtator: A Protégé plug-in for annotated corpus construction
Philip V. Ogren
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Demonstrations
2005
pdf
bib
Corpus Design for Biomedical Natural Language Processing
K. Bretonnel Cohen
|
Lynne Fox
|
Philip V. Ogren
|
Lawrence Hunter
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics