Chris Brew


2022

pdf bib
Domain-specific knowledge distillation yields smaller and better models for conversational commerce
Kristen Howell | Jian Wang | Akshay Hazare | Joseph Bradley | Chris Brew | Xi Chen | Matthew Dunn | Beth Hockey | Andrew Maurer | Dominic Widdows
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

2020

pdf bib
Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
Giovanni Da San Martino | Chris Brew | Giovanni Luca Ciampaglia | Anna Feldman | Chris Leberknight | Preslav Nakov
Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

pdf bib
Abusive Language Detection using Syntactic Dependency Graphs
Kanika Narang | Chris Brew
Proceedings of the Fourth Workshop on Online Abuse and Harms

Automated detection of abusive language online has become imperative. Current sequential models (LSTM) do not work well for long and complex sentences while bi-transformer models (BERT) are not computationally efficient for the task. We show that classifiers based on syntactic structure of the text, dependency graphical convolutional networks (DepGCNs) can achieve state-of-the-art performance on abusive language datasets. The overall performance is at par with of strong baselines such as fine-tuned BERT. Further, our GCN-based approach is much more efficient than BERT at inference time making it suitable for real-time detection.

2019

pdf bib
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda
Anna Feldman | Giovanni Da San Martino | Alberto Barrón-Cedeño | Chris Brew | Chris Leberknight | Preslav Nakov
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

2018

pdf bib
Proceedings of the First Workshop on Natural Language Processing for Internet Freedom
Chris Brew | Anna Feldman | Chris Leberknight
Proceedings of the First Workshop on Natural Language Processing for Internet Freedom

pdf bib
Digital Operatives at SemEval-2018 Task 8: Using dependency features for malware NLP
Chris Brew
Proceedings of the 12th International Workshop on Semantic Evaluation

The four sub-tasks of SecureNLP build towards a capability for quickly highlighting critical information from malware reports, such as the specific actions taken by a malware sample. Digital Operatives (DO) submitted to sub-tasks 1 and 2, using standard text analysis technology (text classification for sub-task 1, and a CRF for sub-task 2). Performance is broadly competitive with other submitted systems on sub-task 1 and weak on sub-task 2. The annotation guidelines for the intermediate sub-tasks create a linkage to the final task, which is both an annotation challenge and a potentially useful feature of the task. The methods that DO chose do not attempt to make use of this linkage, which may be a missed opportunity. This motivates a post-hoc error analysis. It appears that the annotation task is very hard, and that in some cases both deep conceptual knowledge and substantial surrounding context are needed in order to correctly classify sentences.

2016

pdf bib
Classifying ReachOut posts with a radial basis function SVM
Chris Brew
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

2015

pdf bib
Natural Language Question Answering and Analytics for Diverse and Interlinked Datasets
Dezhao Song | Frank Schilder | Charese Smiley | Chris Brew
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2013

pdf bib
SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge
Myroslava Dzikovska | Rodney Nielsen | Chris Brew | Claudia Leacock | Danilo Giampiccolo | Luisa Bentivogli | Peter Clark | Ido Dagan | Hoa Trang Dang
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Towards Effective Tutorial Feedback for Explanation Questions: A Dataset and Baselines
Myroslava O. Dzikovska | Rodney D. Nielsen | Chris Brew
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Semantic Role Labeling Without Treebanks?
Stephen Boxwell | Chris Brew | Jason Baldridge | Dennis Mehay | Sujith Ravi
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
What a Parser Can Learn from a Semantic Role Labeler and Vice Versa
Stephen Boxwell | Dennis Mehay | Chris Brew
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Pilot Arabic CCGbank
Stephen A. Boxwell | Chris Brew
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe a process for converting the Penn Arabic Treebank into the CCG formalism. Previous efforts have yielded CCGbanks in English, German, and Turkish, thus opening these languages to the sophisticated computational tools developed for CCG and enabling further cross-linguistic development. Conversion from a context free grammar treebank to a CCGbank is a four stage process: head finding, argument classification, binarization, and category conversion. In the process of implementing a basic CCGbank conversion algorithm, we reveal properties of Arabic grammar that interfere with conversion, such as subject topicalization, genitive constructions, relative clauses, and optional pronominal subjects. All of these problematic phenomena can be resolved in a variety of ways - we discuss advantages and disadvantages of each in their respective sections. We detail these and describe our categorial analysis of each of these Arabic grammatical phenomena in depth, as well as technical details on their integration into the conversion algorithm.

2009

pdf bib
Using the Wiktionary Graph Structure for Synonym Detection
Timothy Weale | Chris Brew | Eric Fosler-Lussier
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)

pdf bib
Brutus: A Semantic Role Labeling System Incorporating CCG, CFG, and Dependency Features
Stephen Boxwell | Dennis Mehay | Chris Brew
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Which Are the Best Features for Automatic Verb Classification
Jianguo Li | Chris Brew
Proceedings of ACL-08: HLT

pdf bib
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics
Martha Palmer | Chris Brew | Fei Xia
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

pdf bib
Statistical Identification of English Loanwords in Korean Using Automatically Generated Training Data
Kirk Baker | Chris Brew
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes an accurate, extensible method for automatically classifying unknown foreign words that requires minimal monolingual resources and no bilingual training data (which is often difficult to obtain for an arbitrary language pair). We use a small set of phonologically-based transliteration rules to generate a potentially unlimited amount of pseudo-data that can be used to train a classifier to distinguish etymological classes of actual words. We ran a series of experiments on identifying English loanwords in Korean, in order to explore the consequences of using pseudo-data in place of the original training data. Results show that a sufficient quantity of automatically generated training data, even produced by fairly low precision transliteration rules, can be used to train a classifier that performs within 0.3% of one trained on actual English loanwords (96% accuracy).

2007

pdf bib
BLEUÂTRE: flattening syntactic dependencies for MT evaluation
Dennis N. Mehay | Chris Brew
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
A shared task involving multi-label classification of clinical free text
John P. Pestian | Chris Brew | Pawel Matykiewicz | DJ Hovermale | Neil Johnson | K. Bretonnel Cohen | Wlodzislaw Duch
Biological, translational, and clinical language processing

2006

pdf bib
Tagging Portuguese with a Spanish Tagger
Jirka Hana | Anna Feldman | Luiz Amaral | Chris Brew
Proceedings of the Cross-Language Knowledge Induction Workshop

pdf bib
A Cross-language Approach to Rapid Creation of New Morpho-syntactically Annotated Resources
Anna Feldman | Jirka Hana | Chris Brew
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We take a novel approach to rapid, low-cost development of morpho-syntactically annotated resources without using parallel corpora or bilingual lexicons. The overall research question is how to exploit language resources and properties to facilitate and automate the creation of morphologically annotated corpora for new languages. This portability issue is especially relevant to minority languages, for which such resources are likely to remain unavailable in the foreseeable future. We compare the performance of our system on languages that belong to different language families (Romance vs. Slavic), as well as different language pairs within the same language family (Portuguese via Spanish vs. Catalan via Spanish). We show that across language families, the most difficult category is the category of nominals (the noun homonymy is challenging for morphological analysis and the order variation of adjectives within a sentence makes it challenging to create a realiable model), whereas different language families present different challenges with respect to their morpho-syntactic descriptions: for the Slavic languages, case is the most challenging category; for the Romance languages, gender is more challenging than case. In addition, we present an alternative evaluation metric for our system, where we measure how much human labor will be needed to convert the result of our tagging to a high precision annotated resource.

pdf bib
A Finite-State Model of Human Sentence Processing
Jihyun Park | Chris Brew
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Parsing and Subcategorization Data
Jianguo Li | Chris Brew
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL
Chris Brew | Dragomir Radev
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

pdf bib
“Language and Computers”: Creating an Introduction for a General Undergraduate Audience
Chris Brew | Markus Dickinson | W. Detmar Meurers
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

pdf bib
Robust Extraction of Subcategorization Data from Spoken Language
Jianguo Li | Chris Brew | Eric Fosler-Lussier
Proceedings of the Ninth International Workshop on Parsing Technology

pdf bib
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
Raymond Mooney | Chris Brew | Lee-Feng Chien | Katrin Kirchhoff
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Verb Class Disambiguation Using Informative Priors
Mirella Lapata | Chris Brew
Computational Linguistics, Volume 30, Number 1, March 2004

pdf bib
A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources
Jiri Hana | Anna Feldman | Chris Brew
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Distributional Model of Semantic Context Effects in Lexical Processing
Scott McDonald | Chris Brew
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Book Review: The Cambridge Grammar of the English Language by Rodney Huddleston and Geoffrey K. Pullum
Chris Brew
Computational Linguistics, Volume 29, Number 1, March 2003

2002

pdf bib
Spectral Clustering for German Verbs
Chris Brew | Sabine Schulte im Walde
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Using the Segmentation Corpus to Define an Inventory of Concatenative Units for Cantonese Speech Synthesis
Wai Yi Peggy Wong | Chris Brew | Mary E. Beckman | Shui-duen Chan
COLING-02: The First SIGHAN Workshop on Chinese Language Processing

pdf bib
Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information
Sabine Schulte im Walde | Chris Brew
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Stone soup translation
Paul C. Davis | Chris Brew
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2001

pdf bib
Book Reviews: Advances in Probabilistic and Other Parsing Technologies
Chris Brew
Computational Linguistics, Volume 27, Number 3, September 2001

1999

pdf bib
Using Subcategorization to Resolve Verb Class Ambiguity
Maria Lapata | Chris Brew
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf bib
Error-Driven Learning of Chinese Word Segmentation
Julia Hockenmaier | Chris Brew
Proceedings of the 12th Pacific Asia Conference on Language, Information and Computation

1997

pdf bib
Using SGML as a Basis for Data-Intensive NLP
David McKelvie | Chris Brew | Henry Thompson
Fifth Conference on Applied Natural Language Processing

1995

pdf bib
Stochastic HPSG
Chris Brew
Seventh Conference of the European Chapter of the Association for Computational Linguistics

1994

pdf bib
Automatic Evaluation of Computer Generated Text: A Progress Report on the TextEval Project
Chris Brew | Henry S. Thompson
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

pdf bib
Priority Union and Generalization in Discourse Grammars
Claire Grover | Chris Brew | Suresh Manandhar | Marc Moens
32nd Annual Meeting of the Association for Computational Linguistics

1992

pdf bib
Letting the Cat Out of the Bag: Generation for Shake-and-Bake MT
Chris Brew
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics

1991

pdf bib
Systematic Classification and its Efficiency
Chris Brew
ComputationaI Linguistics, Volume 17, Number 4, December 1991

1990

pdf bib
Partial Descriptions and Systemic Grammar
Chris Brew
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics