Lillian Lee


2023

pdf bib
Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest
Jack Hessel | Ana Marasovic | Jena D. Hwang | Lillian Lee | Jeff Da | Rowan Zellers | Robert Mankoff | Yejin Choi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large neural networks can now generate jokes, but do they really “understand” humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying a winning caption, and explaining why a winning caption is funny. These tasks encapsulate progressively more sophisticated aspects of “understanding” a cartoon; key elements are the complex, often surprising relationships between images and captions and the frequent inclusion of indirect and playful allusions to human experience and culture. We investigate both multimodal and language-only models: the former are challenged with the cartoon images directly, while the latter are given multifaceted descriptions of the visual scene to simulate human-level visual understanding. We find that both types of models struggle at all three tasks. For example, our best multimodal models fall 30 accuracy points behind human performance on the matching task, and, even when provided ground-truth visual scene descriptors, human-authored explanations are preferred head-to-head over the best machine-authored ones (few-shot GPT-4) in more than 2/3 of cases. We release models, code, leaderboard, and corpus, which includes newly-gathered annotations describing the image’s locations/entities, what’s unusual in the scene, and an explanation of the joke.

2022

pdf bib
War and Pieces: Comparing Perspectives About World War I and II Across Wikipedia Language Communities
Ana Smith | Lillian Lee
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Wikipedia is widely used to train models for various tasks including semantic association, text generation, and translation. These tasks typically involve aligning and using text from multiple language editions, with the assumption that all versions of the article present the same content. But this assumption may not hold. We introduce a methodology for approximating the extent to which narratives of conflict may diverge in this scenario, focusing on articles about World War I and II battles written by Wikipedia’s communities of editors across four language editions. For simplicity, our unit of analysis representing each language communities’ perspectives is based on national entities and their subject-object-relation context, identified using named entity recognition and open-domain information extraction. Using a vector representation of these tuples, we evaluate how similarly different language editions portray how and how often these entities are mentioned in articles. Our results indicate that (1) language editions tend to reference associated countries more and (2) how much one language edition’s depiction overlaps with all others varies.

2021

pdf bib
TGIF: Tree-Graph Integrated-Format Parser for Enhanced UD with Two-Stage Generic- to Individual-Language Finetuning
Tianze Shi | Lillian Lee
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

We present our contribution to the IWPT 2021 shared task on parsing into enhanced Universal Dependencies. Our main system component is a hybrid tree-graph parser that integrates (a) predictions of spanning trees for the enhanced graphs with (b) additional graph edges not present in the spanning trees. We also adopt a finetuning strategy where we first train a language-generic parser on the concatenation of data from all available languages, and then, in a second step, finetune on each individual language separately. Additionally, we develop our own complete set of pre-processing modules relevant to the shared task, including tokenization, sentence segmentation, and multiword token expansion, based on pre-trained XLM-R models and our own pre-training of character-level language models. Our submission reaches a macro-average ELAS of 89.24 on the test set. It ranks top among all teams, with a margin of more than 2 absolute ELAS over the next best-performing submission, and best score on 16 out of 17 languages.

pdf bib
Learning Syntax from Naturally-Occurring Bracketings
Tianze Shi | Ozan İrsoy | Igor Malioutov | Lillian Lee
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Naturally-occurring bracketings, such as answer fragments to natural language questions and hyperlinks on webpages, can reflect human syntactic intuition regarding phrasal boundaries. Their availability and approximate correspondence to syntax make them appealing as distant information sources to incorporate into unsupervised constituency parsing. But they are noisy and incomplete; to address this challenge, we develop a partial-brackets-aware structured ramp loss in learning. Experiments demonstrate that our distantly-supervised models trained on naturally-occurring bracketing data are more accurate in inducing syntactic structures than competing unsupervised systems. On the English WSJ corpus, our models achieve an unlabeled F1 score of 68.9 for constituency parsing.

pdf bib
Assessing Cognitive Linguistic Influences in the Assignment of Blame
Karen Zhou | Ana Smith | Lillian Lee
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

Lab studies in cognition and the psychology of morality have proposed some thematic and linguistic factors that influence moral reasoning. This paper assesses how well the findings of these studies generalize to a large corpus of over 22,000 descriptions of fraught situations posted to a dedicated forum. At this social-media site, users judge whether or not an author is in the wrong with respect to the event that the author described. We find that, consistent with lab studies, there are statistically significant differences in uses of first-person passive voice, as well as first-person agents and patients, between descriptions of situations that receive different blame judgments. These features also aid performance in the task of predicting the eventual collective verdicts.

pdf bib
Transition-based Bubble Parsing: Improvements on Coordination Structure Prediction
Tianze Shi | Lillian Lee
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We propose a transition-based bubble parser to perform coordination structure identification and dependency-based syntactic analysis simultaneously. Bubble representations were proposed in the formal linguistics literature decades ago; they enhance dependency trees by encoding coordination boundaries and internal relationships within coordination structures explicitly. In this paper, we introduce a transition system and neural models for parsing these bubble-enhanced structures. Experimental results on the English Penn Treebank and the English GENIA corpus show that our parsers beat previous state-of-the-art approaches on the task of coordination structure prediction, especially for the subset of sentences with complex coordination structures.

2020

pdf bib
On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries
Tianze Shi | Chen Zhao | Jordan Boyd-Graber | Hal Daumé III | Lillian Lee
Findings of the Association for Computational Linguistics: EMNLP 2020

Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce SQUALL, a dataset that enriches 11,276 WIKITABLEQUESTIONS English-language questions with manually created SQL equivalents plus alignments between SQL and question fragments. Our annotation enables new training possibilities for encoderdecoder models, including approaches from machine translation previously precluded by the absence of alignments. We propose and test two methods: (1) supervised attention; (2) adopting an auxiliary objective of disambiguating references in the input queries to table columns. In 5-fold cross validation, these strategies improve over strong baselines by 4.4% execution accuracy. Oracle experiments suggest that annotated alignments can support further accuracy gains of up to 23.9%.

pdf bib
Extracting Headless MWEs from Dependency Parse Trees: Parsing, Tagging, and Joint Modeling Approaches
Tianze Shi | Lillian Lee
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

An interesting and frequent type of multi-word expression (MWE) is the headless MWE, for which there are no true internal syntactic dominance relations; examples include many named entities (“Wells Fargo”) and dates (“July 5, 2020”) as well as certain productive constructions (“blow for blow”, “day after day”). Despite their special status and prevalence, current dependency-annotation schemes require treating such flat structures as if they had internal syntactic heads, and most current parsers handle them in the same fashion as headed constructions. Meanwhile, outside the context of parsing, taggers are typically used for identifying MWEs, but taggers might benefit from structural information. We empirically compare these two common strategies—parsing and tagging—for predicting flat MWEs. Additionally, we propose an efficient joint decoding algorithm that combines scores from both strategies. Experimental results on the MWE-Aware English Dependency Corpus and on six non-English dependency treebanks with frequent flat structures show that: (1) tagging is more accurate than parsing for identifying flat-structure MWEs, (2) our joint decoder reconciles the two different views and, for non-BERT features, leads to higher accuracies, and (3) most of the gains result from feature sharing between the parsers and taggers.

pdf bib
Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!
Jack Hessel | Lillian Lee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Modeling expressive cross-modal interactions seems crucial in multimodal tasks, such as visual question answering. However, sometimes high-performing black-box algorithms turn out to be mostly exploiting unimodal signals in the data. We propose a new diagnostic tool, empirical multimodally-additive function projection (EMAP), for isolating whether or not cross-modal interactions improve performance for a given model on a given task. This function projection modifies model predictions so that cross-modal interactions are eliminated, isolating the additive, unimodal structure. For seven image+text classification tasks (on each of which we set new state-of-the-art benchmarks), we find that, in many cases, removing cross-modal interactions results in little to no performance degradation. Surprisingly, this holds even when expressive models, with capacity to consider interactions, otherwise outperform less expressive models; thus, performance improvements, even when present, often cannot be attributed to consideration of cross-modal feature interactions. We hence recommend that researchers in multimodal machine learning report the performance not only of unimodal baselines, but also the EMAP of their best-performing model.

2019

pdf bib
Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features
Jack Hessel | Lillian Lee
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Controversial posts are those that split the preferences of a community, receiving both significant positive and significant negative feedback. Our inclusion of the word “community” here is deliberate: what is controversial to some audiences may not be so to others. Using data from several different communities on reddit.com, we predict the ultimate controversiality of posts, leveraging features drawn from both the textual content and the tree structure of the early comments that initiate the discussion. We find that even when only a handful of comments are available, e.g., the first 5 comments made within 15 minutes of the original post, discussion features often add predictive capacity to strong content-and- rate only baselines. Additional experiments on domain transfer suggest that conversation- structure features often generalize to other communities better than conversation-content features do.

pdf bib
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel | Lillian Lee | David Mimno
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present. We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training. We experiment on seven datasets of varying difficulty, ranging from documents consisting of groups of images captioned post hoc by crowdworkers to naturally-occurring user-generated multimodal documents. We find that a structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time.

bib
Transactions of the Association for Computational Linguistics, Volume 7
Lillian Lee | Mark Johnson | Brian Roark | Ani Nenkova
Transactions of the Association for Computational Linguistics, Volume 7

2018

bib
Transactions of the Association for Computational Linguistics, Volume 6
Lillian Lee | Mark Johnson | Kristina Toutanova | Brian Roark
Transactions of the Association for Computational Linguistics, Volume 6

pdf bib
Global Transition-based Non-projective Dependency Parsing
Carlos Gómez-Rodríguez | Tianze Shi | Lillian Lee
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Shi, Huang, and Lee (2017a) obtained state-of-the-art results for English and Chinese dependency parsing by combining dynamic-programming implementations of transition-based dependency parsers with a minimal set of bidirectional LSTM features. However, their results were limited to projective parsing. In this paper, we extend their approach to support non-projectivity by providing the first practical implementation of the MH₄ algorithm, an O(n4) mildly nonprojective dynamic-programming parser with very high coverage on non-projective treebanks. To make MH₄ compatible with minimal transition-based feature sets, we introduce a transition-based interpretation of it in which parser items are mapped to sequences of transitions. We thus obtain the first implementation of global decoding for non-projective transition-based parsing, and demonstrate empirically that it is effective than its projective counterpart in parsing a number of highly non-projective languages.

pdf bib
Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets
Jack Hessel | David Mimno | Lillian Lee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.

pdf bib
Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers
Tianze Shi | Carlos Gómez-Rodríguez | Lillian Lee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We generalize Cohen, Gómez-Rodríguez, and Satta’s (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to O(n6), improving over the known bounds in exact inference for non-projective transition-based parsing. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees.

pdf bib
Valency-Augmented Dependency Parsing
Tianze Shi | Lillian Lee
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a complete, automated, and efficient approach for utilizing valency analysis in making dependency parsing decisions. It includes extraction of valency patterns, a probabilistic model for tagging these patterns, and a joint decoding process that explicitly considers the number and types of each token’s syntactic dependents. On 53 treebanks representing 41 languages in the Universal Dependencies data, we find that incorporating valency information yields higher precision and F1 scores on the core arguments (subjects and complements) and functional relations (e.g., auxiliaries) that we employ for valency analysis. Precision on core arguments improves from 80.87 to 85.43. We further show that our approach can be applied to an ostensibly different formalism and dataset, Tree Adjoining Grammar as extracted from the Penn Treebank; there, we outperform the previous state-of-the-art labeled attachment score by 0.7. Finally, we explore the potential of extending valency patterns beyond their traditional domain by confirming their helpfulness in improving PP attachment decisions.

2017

pdf bib
Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set
Tianze Shi | Liang Huang | Lillian Lee
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework of Huang and Sagae (2010) and Kuhlmann et al. (2011) to produce the first implementation of worst-case O(n3) exact decoders for arc-hybrid and arc-eager transition systems. With our minimal features, we also present O(n3) global training methods. Finally, using ensembles including our new parsers, we achieve the best unlabeled attachment score reported (to our knowledge) on the Chinese Treebank and the “second-best-in-class” result on the English Penn Treebank.

bib
Transactions of the Association for Computational Linguistics, Volume 5
Lillian Lee | Mark Johnson | Kristina Toutanova
Transactions of the Association for Computational Linguistics, Volume 5

2016

bib
Transactions of the Association for Computational Linguistics, Volume 4
Lillian Lee | Mark Johnson | Kristina Toutanova
Transactions of the Association for Computational Linguistics, Volume 4

2015

bib
Transactions of the Association for Computational Linguistics, Volume 3
Michael Collins | Lillian Lee
Transactions of the Association for Computational Linguistics, Volume 3

2014

pdf bib
The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
Chenhao Tan | Lillian Lee | Bo Pang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication
Chenhao Tan | Lillian Lee
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

bib
Transactions of the Association for Computational Linguistics, Volume 2
Dekang Lin | Michael Collins | Lillian Lee
Transactions of the Association for Computational Linguistics, Volume 2

pdf bib
Is It All in the Phrasing? Computational Explorations in How We Say What We Say, and Why It Matters
Lillian Lee
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Keynote: Language Adaptation
Lillian Lee
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2012

pdf bib
Hedge Detection as a Lens on Framing in the GMO Debates: A Position Paper
Eunsol Choi | Chenhao Tan | Lillian Lee | Cristian Danescu-Niculescu-Mizil | Jennifer Spindel
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

pdf bib
You Had Me at Hello: How Phrasing Affects Memorability
Cristian Danescu-Niculescu-Mizil | Justin Cheng | Jon Kleinberg | Lillian Lee
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs
Cristian Danescu-Niculescu-Mizil | Lillian Lee
Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics

2010

pdf bib
(Invited Talk) Clueless: Explorations in Unsupervised, Knowledge-Lean Extraction of Lexical-Semantic Information
Lillian Lee
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

pdf bib
Don’t ‘Have a Clue’? Unsupervised Co-Learning of Downward-Entailing Operators.
Cristian Danescu-Niculescu-Mizil | Lillian Lee
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia
Mark Yatskar | Bo Pang | Cristian Danescu-Niculescu-Mizil | Lillian Lee
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Without a ’doubt’? Unsupervised Discovery of Downward-Entailing Operators
Cristian Danescu-Niculescu-Mizil | Lillian Lee | Richard Ducott
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework
Mohit Bansal | Claire Cardie | Lillian Lee
Coling 2008: Companion volume: Posters

pdf bib
Using Very Simple Statistics for Review Search: An Exploration
Bo Pang | Lillian Lee
Coling 2008: Companion volume: Posters

2006

pdf bib
Get out the vote: Determining support or opposition from Congressional floor-debate transcripts
Matt Thomas | Bo Pang | Lillian Lee
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
Bo Pang | Lillian Lee
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
Bo Pang | Lillian Lee
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
Regina Barzilay | Lillian Lee
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

2003

pdf bib
Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Regina Barzilay | Lillian Lee
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

2002

pdf bib
A non-programming introduction to computer science via NLP,IR,and AI
Lillian Lee
Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics

pdf bib
Thumbs up? Sentiment Classification using Machine Learning Techniques
Bo Pang | Lillian Lee | Shivakumar Vaithyanathan
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Bootstrapping Lexical Choice via Multiple-Sequence Alignment
Regina Barzilay | Lillian Lee
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2000

pdf bib
Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji
Rie Kubota Ando | Lillian Lee
1st Meeting of the North American Chapter of the Association for Computational Linguistics

pdf bib
Book Reviews: Foundations of Statistical Natural Language Processing
Lillian Lee
Computational Linguistics, Volume 26, Number 2, June 2000

1999

pdf bib
Measures of Distributional Similarity
Lillian Lee
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

pdf bib
Distributional Similarity Models: Clustering vs. Nearest Neighbors
Lillian Lee
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1997

pdf bib
Fast Context-Free Parsing Requires Fast Boolean Matrix Multiplication
Lillian Lee
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Similarity-Based Methods for Word Sense Disambiguation
Ido Dagan | Lillian Lee | Fernando Pereira
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1994

pdf bib
Similarity-Based Estimation of Word Cooccurrence Probabilities
Ido Dagan | Fernando Pereira | Lillian Lee
32nd Annual Meeting of the Association for Computational Linguistics

1993

pdf bib
Distributional Clustering of English Words
Fernando Pereira | Naftali Tishby | Lillian Lee
31st Annual Meeting of the Association for Computational Linguistics