Dan Klein


2024

pdf bib
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
Vivek Verma | Eve Fleisig | Nicholas Tomlin | Dan Klein
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We introduce Ghostbuster, a state-of-the-art system for detecting AI-generated text.Our method works by passing documents through a series of weaker language models, running a structured search over possible combinations of their features, and then training a classifier on the selected features to predict whether documents are AI-generated.Crucially, Ghostbuster does not require access to token probabilities from the target model, making it useful for detecting text generated by black-box or unknown models.In conjunction with our model, we release three new datasets of human- and AI-generated text as detection benchmarks in the domains of student essays, creative writing, and news articles. We compare Ghostbuster to several existing detectors, including DetectGPT and GPTZero, as well as a new RoBERTa baseline. Ghostbuster achieves 99.0 F1 when evaluated across domains, which is 5.9 F1 higher than the best preexisting model. It also outperforms all previous approaches in generalization across writing domains (+7.5 F1), prompting strategies (+2.1 F1), and language models (+4.4 F1). We also analyze our system’s robustness to a variety of perturbations and paraphrasing attacks, and evaluate its performance on documents by non-native English speakers.

pdf bib
The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels
Eve Fleisig | Su Lin Blodgett | Dan Klein | Zeerak Talat
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement–some challenged by perspectivist approaches, and some that remain to be addressed–as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.

pdf bib
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
Chancharik Mitra | Abrar Anwar | Rodolfo Corona | Dan Klein | Trevor Darrell | Jesse Thomason
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an object’s appearance can vary with camera position. As such, we present the Multi-view Approach to Grounding in Context (MAGiC) model, which selects an object referent based on language that distinguishes between two similar objects. By pragmatically reasoning over both objects and across multiple views of those objects, MAGiC improves over the state-of-the-art model on the SNARE object reference task with a relative error reduction of 12.9% (representing an absolute improvement of 2.7%). Ablation studies show that reasoning jointly over object referent candidates and multiple views of each object both contribute to improved accuracy. Code: https://github.com/rcorona/magic_snare/

pdf bib
Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction
Boyi Li | Rodolfo Corona | Karttikeya Mangalam | Catherine Chen | Daniel Flaherty | Serge Belongie | Kilian Weinberger | Jitendra Malik | Trevor Darrell | Dan Klein
Findings of the Association for Computational Linguistics: NAACL 2024

Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is a C-PFCG that incorporates embeddings from text-only large language models (LLMs). We use a fixed grammar family to directly compare LC-PCFG to various multimodal grammar induction methods. We compare performance on four benchmark datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1 compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is also more computationally efficient, providing an up to 85% reduction in parameter count and 8.8× reduction in training time compared to multimodal approaches. These results suggest that multimodal inputs may not be necessary for grammar induction, and emphasize the importance of strong vision-free baselines for evaluating the benefit of multimodal approaches.

pdf bib
What Evidence Do Language Models Find Convincing?
Alexander Wan | Eric Wallace | Dan Klein
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as “is aspartame linked to cancer”. To resolve these ambiguous queries, one must search through a large range of websites and consider “which, if any, of this evidence do I find convincing?”. In this work, we study how LLMs answer this question. In particular, we construct ConflictingQA, a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts (e.g., quantitative results), argument styles (e.g., appeals to authority), and answers (Yes or No). We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions. Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important such as whether a text contains scientific references or is written with a neutral tone. Taken together, these results highlight the importance of RAG corpus quality (e.g., the need to filter misinformation), and possibly even a shift in how LLMs are trained to better align with human judgements.

pdf bib
American Sign Language Handshapes Reflect Pressures for Communicative Efficiency
Kayo Yin | Terry Regier | Dan Klein
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Communicative efficiency is a key topic in linguistics and cognitive psychology, with many studies demonstrating how the pressure to communicate with minimal effort guides the form of natural language. However, this phenomenon is rarely explored in signed languages. This paper shows how handshapes in American Sign Language (ASL) reflect these efficiency pressures and provides new evidence of communicative efficiency in the visual-gestural modality.We focus on hand configurations in native ASL signs and signs borrowed from English to compare efficiency pressures from both ASL and English usage. First, we develop new methodologies to quantify the articulatory effort needed to produce handshapes and the perceptual effort required to recognize them. Then, we analyze correlations between communicative effort and usage statistics in ASL or English. Our findings reveal that frequent ASL handshapes are easier to produce and that pressures for communicative efficiency mostly come from ASL usage, rather than from English lexical borrowing.

2023

pdf bib
The Whole Truth and Nothing But the Truth: Faithful and Controllable Dialogue Response Generation with Dataflow Transduction and Constrained Decoding
Hao Fang | Anusha Balakrishnan | Harsh Jhamtani | John Bufe | Jean Crawford | Jayant Krishnamurthy | Adam Pauls | Jason Eisner | Jacob Andreas | Dan Klein
Findings of the Association for Computational Linguistics: ACL 2023

In a real-world dialogue system, generated text must be truthful and informative while remaining fluent and adhering to a prescribed style. Satisfying these constraints simultaneously isdifficult for the two predominant paradigms in language generation: neural language modeling and rule-based generation. We describe a hybrid architecture for dialogue response generation that combines the strengths of both paradigms. The first component of this architecture is a rule-based content selection model defined using a new formal framework called dataflow transduction, which uses declarative rules to transduce a dialogue agent’s actions and their results (represented as dataflow graphs) into context-free grammars representing the space of contextually acceptable responses. The second component is a constrained decoding procedure that uses these grammars to constrain the output of a neural language model, which selects fluent utterances. Our experiments show that this system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.

pdf bib
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Jonathan Pei | Kevin Yang | Dan Klein
Findings of the Association for Computational Linguistics: ACL 2023

We propose Prefix-Adaptive Decoding (PREADD), a flexible method for controlled text generation. Unlike existing methods that use auxiliary expert models to control for attributes, PREADD does not require an external model, instead relying on linearly combining output logits from multiple prompts. Specifically, PREADD contrasts the output logits generated using a raw prompt against those generated using a prefix-prepended prompt, enabling both positive and negative control with respect to any attribute encapsulated by the prefix. We evaluate PREADD on three tasks—toxic output mitigation, gender bias reduction, and sentiment control—and find that PREADD outperforms not only prompting baselines, but also an auxiliary-expert control method, by 12% or more in relative gain on our main metrics for each task.

pdf bib
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Catherine Chen | Zejiang Shen | Dan Klein | Gabriel Stanovsky | Doug Downey | Kyle Lo
Findings of the Association for Computational Linguistics: ACL 2023

Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g., papers from the same publisher), but in practice models encounter documents with unfamiliar distributions of layout features, such as new combinations of text sizes and styles, or new spatial configurations of textual elements. In this work we test whether layout-infused LMs are robust to layout distribution shifts. As a case study we use the task of scientific document structure recovery, segmenting a scientific paper into its structural categories (e.g., “title”, “caption”, “reference”). To emulate distribution shifts that occur in practice we re-partition the GROTOAP2 dataset. We find that under layout distribution shifts model performance degrades by up to 20 F1. Simple training strategies, such as increasing training diversity, can reduce this degradation by over 35% relative F1; however, models fail to reach in-distribution performance in any tested out-of-distribution conditions. This work highlights the need to consider layout distribution shifts during model evaluation, and presents a methodology for conducting such evaluations.

pdf bib
Decomposing Complex Queries for Tip-of-the-tongue Retrieval
Kevin Lin | Kyle Lo | Joseph Gonzalez | Dan Klein
Findings of the Association for Computational Linguistics: EMNLP 2023

When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs—complex queries that describe content elements (e.g., book characters or events), information beyond the document text (e.g., descriptions of book covers), or personal context (e.g., when they read a book). Standard retrieval models that rely on lexical or semantic overlap between query and document text are challenged in such retrieval settings, known as tip-of-the-tongue (TOT) retrieval. We introduce a simple but effective framework for handling such complex queries by decomposing the query with an LLM into individual clues routing those as subqueries to specialized retrievers, and ensembling the results. Our approach takes advantage of off-the-shelf retrievers (e.g., CLIP for retrieving images of book covers) or incorporate retriever-specific logic (e.g., date constraints). We show that our framework incorporating query decomposition into retrievers can improve gold book recall up to 6% absolute gain for Recall@5 on a new collection of 14,441 real-world query-book pairs from an online community for resolving TOT inquiries.

pdf bib
Improving Pacing in Long-Form Story Planning
Yichen Wang | Kevin Yang | Xiaoming Liu | Dan Klein
Findings of the Association for Computational Linguistics: EMNLP 2023

Existing LLM-based systems for writing long-form stories or story outlines frequently suffer from unnatural pacing, whether glossing over important events or over-elaborating on insignificant details, resulting in a jarring experience for the reader. We propose a **CONC**rete **O**utline **C**on**T**rol (CONCOCT) system to improve pacing when automatically generating story outlines. We first train a *concreteness evaluator* to judge which of two events is more concrete (low-level-detailed). This evaluator can then be used to control pacing in hierarchical outline generation; in this work, we explore a *vaguest-first* expansion procedure that aims for uniform pacing. We further use the evaluator to filter new outline items based on predicted concreteness. Compared to a baseline hierarchical outline generator, humans judge CONCOCT’s pacing to be more consistent over 57% of the time across multiple outline lengths; the gains also translate to downstream stories. All code, data, and models are open-sourced.

pdf bib
Revisiting Entropy Rate Constancy in Text
Vivek Verma | Nicholas Tomlin | Dan Klein
Findings of the Association for Computational Linguistics: EMNLP 2023

The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel and Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram language models. We re-evaluate the claims of Genzel and Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy. We conduct a range of experiments across datasets, model sizes, and languages and discuss implications for the uniform information density hypothesis and linguistic theories of efficient communication more broadly.

pdf bib
Incorporating Worker Perspectives into MTurk Annotation Practices for NLP
Olivia Huang | Eve Fleisig | Dan Klein
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Current practices regarding data collection for natural language processing on Amazon Mechanical Turk (MTurk) often rely on a combination of studies on data quality and heuristics shared among NLP researchers. However, without considering the perspectives of MTurk workers, these approaches are susceptible to issues regarding workers’ rights and poor response quality. We conducted a critical literature review and a survey of MTurk workers aimed at addressing open questions regarding best practices for fair payment, worker privacy, data quality, and considering worker incentives. We found that worker preferences are often at odds with received wisdom among NLP researchers. Surveyed workers preferred reliable, reasonable payments over uncertain, very high payments; reported frequently lying on demographic questions; and expressed frustration at having work rejected with no explanation. We also found that workers view some quality control methods, such as requiring minimum response times or Master’s qualifications, as biased and largely ineffective. Based on the survey results, we provide recommendations on how future NLP studies may better account for MTurk workers’ experiences in order to respect workers’ rights and improve data quality.

pdf bib
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Ruiqi Zhong | Charlie Snell | Dan Klein | Jason Eisner
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs, we ask them to select indirectly by examining the programs’ input-ouput examples. For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs. It then asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct and could be used to fine-tune the parser. As a first case study, we recruited human non-programmers to use APEL to re-annotate SPIDER, a text-to-SQL dataset. Our approach achieved the same annotation accuracy as the original expert annotators (75%) and exposed many subtle errors in the original annotations.

pdf bib
When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks
Eve Fleisig | Rediet Abebe | Dan Klein
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Though majority vote among annotators is typically used for ground truth labels in machine learning, annotator disagreement in tasks such as hate speech detection may reflect systematic differences in opinion across groups, not noise. Thus, a crucial problem in hate speech detection is determining if a statement is offensive to the demographic group that it targets, when that group may be a small fraction of the annotator pool. We construct a model that predicts individual annotator ratings on potentially offensive text and combines this information with the predicted target group of the text to predict the ratings of target group members. We show gains across a range of metrics, including raising performance over the baseline by 22% at predicting individual annotators’ ratings and by 33% at predicting variance among annotators, which provides a metric for model uncertainty downstream. We find that annotators’ ratings can be predicted using their demographic information as well as opinions on online content, and that non-invasive questions on annotators’ online experiences minimize the need to collect demographic information when predicting annotators’ opinions.

pdf bib
Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection
Vyoma Raman | Eve Fleisig | Dan Klein
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The impact of AI models on marginalized communities has traditionally been measured by identifying performance differences between specified demographic subgroups. Though this approach aims to center vulnerable groups, it risks obscuring patterns of harm faced by intersectional subgroups or shared across multiple groups. To address this, we draw on theories of marginalization from disability studies and related disciplines, which state that people farther from the norm face greater adversity, to consider the “margins” in the domain of toxicity detection. We operationalize the “margins” of a dataset by employing outlier detection to identify text about people with demographic attributes distant from the “norm”. We find that model performance is consistently worse for demographic outliers, with mean squared error (MSE) between outliers and non-outliers up to 70.4% worse across toxicity types. It is also worse for text outliers, with a MSE up to 68.4% higher for outliers than non-outliers. We also find text and demographic outliers to be particularly susceptible to errors in the classification of severe toxicity and identity attacks. Compared to analysis of disparities using traditional demographic breakdowns, we find that our outlier analysis frequently surfaces greater harms faced by a larger, more intersectional group, which suggests that outlier analysis is particularly beneficial for identifying harms against those groups.

pdf bib
Neural Unsupervised Reconstruction of Protolanguage Word Forms
Andre He | Nicholas Tomlin | Dan Klein
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a state-of-the-art neural approach to the unsupervised reconstruction of ancient word forms. Previous work in this domain used expectation-maximization to predict simple phonological changes between ancient word forms and their cognates in modern languages. We extend this work with neural models that can capture more complicated phonological and morphological changes. At the same time, we preserve the inductive biases from classical methods by building monotonic alignment constraints into the model and deliberately underfitting during the maximization step. We evaluate our performance on the task of reconstructing Latin from a dataset of cognates across five Romance languages, achieving a notable reduction in edit distance from the target word forms compared to previous methods.

pdf bib
DOC: Improving Long Story Coherence With Detailed Outline Control
Kevin Yang | Dan Klein | Nanyun Peng | Yuandong Tian
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose the Detailed Outline Control (DOC) framework for improving long-range plot coherence when automatically generating several-thousand-word-long stories. DOC consists of two complementary components: a detailed outliner and a detailed controller. The detailed outliner creates a more detailed, hierarchically structured outline, shifting creative burden from the main drafting procedure to the planning stage. The detailed controller ensures the more detailed outline is still respected during generation by controlling story passages to align with outline details. In human evaluations of automatically generated stories, DOC substantially outperforms a strong Re3 baseline (Yang et al., 2022) on plot coherence (22.5% absolute gain), outline relevance (28.2%), and interestingness (20.7%). Humans also judged DOC to be much more controllable in an interactive generation setting.

pdf bib
Modular Visual Question Answering via Code Generation
Sanjay Subramanian | Medhini Narasimhan | Kushal Khangaonkar | Kevin Yang | Arsha Nagrani | Cordelia Schmid | Andy Zeng | Trevor Darrell | Dan Klein
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a framework that formulates visual question answering as modular code generation. In contrast to prior work on modular approaches to VQA, our approach requires no additional training and relies on pre-trained language models (LMs), visual models pre-trained on image-caption pairs, and fifty VQA examples used for in-context learning. The generated Python programs invoke and compose the outputs of the visual models using arithmetic and conditional logic. Our approach improves accuracy on the COVR dataset by at least 3% and on the GQA dataset by 2% compared to the few-shot baseline that does not employ code generation.

2022

pdf bib
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Kevin Yang | Yuandong Tian | Nanyun Peng | Dan Klein
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We consider the problem of automatically generating longer stories of over two thousand words. Compared to prior work on shorter stories, long-range plot coherence and relevance are more central challenges here. We propose the Recursive Reprompting and Revision framework (Re3) to address these challenges by (a) prompting a general-purpose language model to construct a structured overarching plan, and (b) generating story passages by repeatedly injecting contextual information from both the plan and current story state into a language model prompt. We then revise by (c) reranking different continuations for plot coherence and premise relevance, and finally (d) editing the best continuation for factual consistency. Compared to similar-length stories generated directly from the same base model, human evaluators judged substantially more of Re3’s stories as having a coherent overarching plot (by 14% absolute increase), and relevant to the given initial premise (by 20%).

pdf bib
Automated Crossword Solving
Eric Wallace | Nicholas Tomlin | Albert Xu | Kevin Yang | Eshaan Pathak | Matthew Ginsberg | Dan Klein
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles. Our system works by generating answer candidates for each crossword clue using neural question answering models and then combines loopy belief propagation with local search to find full puzzle solutions. Compared to existing approaches, our system improves exact puzzle accuracy from 57% to 82% on crosswords from The New York Times and obtains 99.9% letter accuracy on themeless puzzles. Our system also won first place at the top human crossword tournament, which marks the first time that a computer program has surpassed human performance at this event. To facilitate research on question answering and crossword solving, we analyze our system’s remaining errors and release a dataset of over six million question-answer pairs.

pdf bib
Learned Incremental Representations for Parsing
Nikita Kitaev | Thomas Lu | Dan Klein
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present an incremental syntactic representation that consists of assigning a single discrete label to each word in a sentence, where the label is predicted using strictly incremental processing of a prefix of the sentence, and the sequence of labels for a sentence fully determines a parse tree. Our goal is to induce a syntactic representation that commits to syntactic choices only as they are incrementally revealed by the input, in contrast with standard representations that must make output choices such as attachments speculatively and later throw out conflicting analyses. Our learned representations achieve 93.72 F1 on the Penn Treebank with as few as 5 bits per word, and at 8 bits per word they achieve 94.97 F1, which is comparable with other state of the art parsing models when using the same pre-trained embeddings. We also provide an analysis of the representations learned by our system, investigating properties such as the interpretable syntactic features captured by the system and mechanisms for deferred resolution of syntactic ambiguities.

pdf bib
Inferring Rewards from Language in Context
Jessy Lin | Daniel Fried | Dan Klein | Anca Dragan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In classic instruction following, language like “I’d like the JetBlue flight” maps to actions (e.g., selecting that flight). However, language also conveys information about a user’s underlying reward function (e.g., a general preference for JetBlue), which can allow a model to carry out desirable actions in new contexts. We present a model that infers rewards from language pragmatically: reasoning about how speakers choose utterances not only to elicit desired actions, but also to reveal information about their preferences. On a new interactive flight–booking task with natural language, our model more accurately infers rewards and predicts optimal actions in unseen environments, in comparison to past work that first maps language to actions (instruction following) and then maps actions to rewards (inverse reinforcement learning).

pdf bib
Voxel-informed Language Grounding
Rodolfo Corona | Shizhan Zhu | Dan Klein | Trevor Darrell
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Natural language applied to natural 2D images describes a fundamentally 3D world. We present the Voxel-informed Language Grounder (VLG), a language grounding model that leverages 3D geometric information in the form of voxel maps derived from the visual input using a volumetric reconstruction model. We show that VLG significantly improves grounding accuracy on SNARE, an object reference game task. At the time of writing, VLG holds the top place on the SNARE leaderboard, achieving SOTA results with a 2.0% absolute improvement.

pdf bib
Understanding Game-Playing Agents with Natural Language Annotations
Nicholas Tomlin | Andre He | Dan Klein
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.

2021

pdf bib
Modular Networks for Compositional Instruction Following
Rodolfo Corona | Daniel Fried | Coline Devin | Dan Klein | Trevor Darrell
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Standard architectures used in instruction following often struggle on novel compositions of subgoals (e.g. navigating to landmarks or picking up objects) observed during training. We propose a modular architecture for following natural language instructions that describe sequences of diverse subgoals. In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type. A sequence of modules to execute is chosen by learning to segment the instructions and predicting a subgoal type for each segment. When compared to standard, non-modular sequence-to-sequence approaches on ALFRED, a challenging instruction following benchmark, we find that modularization improves generalization to novel subgoal compositions, as well as to environments unseen in training.

pdf bib
Detoxifying Language Models Risks Marginalizing Minority Voices
Albert Xu | Eshaan Pathak | Eric Wallace | Suchin Gururangan | Maarten Sap | Dan Klein
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Language models (LMs) must be both safe and equitable to be responsibly deployed in practice. With safety in mind, numerous detoxification techniques (e.g., Dathathri et al. 2020; Krause et al. 2020) have been proposed to mitigate toxic LM generations. In this work, we show that these detoxification techniques hurt equity: they decrease the utility of LMs on language used by marginalized groups (e.g., African-American English and minority identity mentions). In particular, we perform automatic and human evaluations of text generation quality when LMs are conditioned on inputs with different dialects and group identifiers. We find that detoxification makes LMs more brittle to distribution shift, especially on language used by marginalized groups. We identify that these failures stem from detoxification methods exploiting spurious correlations in toxicity datasets. Overall, our results highlight the tension between the controllability and distributional robustness of LMs.

pdf bib
FUDGE: Controlled Text Generation With Future Discriminators
Kevin Yang | Dan Klein
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose Future Discriminators for Generation (FUDGE), a flexible and modular method for controlled text generation. Given a pre-existing model G for generating text from a distribution of interest, FUDGE enables conditioning on a desired attribute a (for example, formality) while requiring access only to G’s output logits. FUDGE learns an attribute predictor operating on a partial sequence, and uses this predictor’s outputs to adjust G’s original probabilities. We show that FUDGE models terms corresponding to a Bayesian decomposition of the conditional distribution of G given attribute a. Moreover, FUDGE can easily compose predictors for multiple desired attributes. We evaluate FUDGE on three tasks — couplet completion in poetry, topic control in language generation, and formality change in machine translation — and observe gains in all three tasks.

pdf bib
Constructing Taxonomies from Pretrained Language Models
Catherine Chen | Kevin Lin | Dan Klein
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models. Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those pairwise predictions into trees. The parenthood prediction module produces likelihood scores for each potential parent-child pair, creating a graph of parent-child relation scores. The tree reconciliation module treats the task as a graph optimization problem and outputs the maximum spanning tree of this graph. We train our model on subtrees sampled from WordNet, and test on nonoverlapping WordNet subtrees. We show that incorporating web-retrieved glosses can further improve performance. On the task of constructing subtrees of English WordNet, the model achieves 66.7 ancestor F1, a 20.0% relative increase over the previous best published result on this task. In addition, we convert the original English dataset into nine other languages using Open Multilingual WordNet and extend our results across these languages.

pdf bib
Interactive Assignments for Teaching Structured Neural NLP
David Gaddy | Daniel Fried | Nikita Kitaev | Mitchell Stern | Rodolfo Corona | John DeNero | Dan Klein
Proceedings of the Fifth Workshop on Teaching NLP

We present a set of assignments for a graduate-level NLP course. Assignments are designed to be interactive, easily gradable, and to give students hands-on experience with several key types of structure (sequences, tags, parse trees, and logical forms), modern neural architectures (LSTMs and Transformers), inference algorithms (dynamic programs and approximate search) and training methods (full and weak supervision). We designed assignments to build incrementally both within each assignment and across assignments, with the goal of enabling students to undertake graduate-level research in NLP by the end of the course.

pdf bib
Reference-Centric Models for Grounded Collaborative Dialogue
Daniel Fried | Justin Chiu | Dan Klein
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present a grounded neural dialogue model that successfully collaborates with people in a partially-observable reference game. We focus on a setting where two agents each observe an overlapping part of a world context and need to identify and agree on some object they share. Therefore, the agents should pool their information and communicate pragmatically to solve the task. Our dialogue agent accurately grounds referents from the partner’s utterances using a structured reference resolver, conditions on these referents using a recurrent memory, and uses a pragmatic generation procedure to ensure the partner can resolve the references the agent produces. We evaluate on the OneCommon spatial grounding dialogue task (Udagawa and Aizawa 2019), involving a number of dots arranged on a board with continuously varying positions, sizes, and shades. Our agent substantially outperforms the previous state of the art for the task, obtaining a 20% relative improvement in successful task completion in self-play evaluations and a 50% relative improvement in success in human evaluations.

pdf bib
Constrained Language Models Yield Few-Shot Semantic Parsers
Richard Shin | Christopher Lin | Sam Thomson | Charles Chen | Subhro Roy | Emmanouil Antonios Platanios | Adam Pauls | Dan Klein | Jason Eisner | Benjamin Van Durme
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We explore the use of large pretrained language models as few-shot semantic parsers. The goal in semantic parsing is to generate a structured meaning representation given a natural language input. However, language models are trained to generate natural language. To bridge the gap, we use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation. Our results demonstrate that with only a small amount of data and very little code to convert into English-like representations, our blueprint for rapidly bootstrapping semantic parsers leads to surprisingly effective performance on multiple community tasks, greatly exceeding baseline methods also trained on the same limited data.

pdf bib
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level
Ruiqi Zhong | Dhruba Ghosh | Dan Klein | Jacob Steinhardt
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections
Ruiqi Zhong | Kristy Lee | Zheng Zhang | Dan Klein
Findings of the Association for Computational Linguistics: EMNLP 2021

Large pre-trained language models (LMs) such as GPT-3 have acquired a surprising ability to perform zero-shot learning. For example, to classify sentiment without any training examples, we can “prompt” the LM with the review and the label description “Does the user like this movie?”, and ask whether the next word is “yes” or “no”. However, the next word prediction training objective is still misaligned with the target zero-shot learning objective. To address this weakness, we propose meta-tuning, which directly optimizes the zero-shot learning objective by fine-tuning pre-trained language models on a collection of datasets. We focus on classification tasks, and construct the meta-dataset by aggregating 43 existing datasets and annotating 441 label descriptions in a question-answering (QA) format. When evaluated on unseen tasks, meta-tuned models outperform a same-sized QA model and the previous SOTA zero-shot learning system based on natural language inference. Additionally, increasing parameter count from 220M to 770M improves AUC-ROC scores by 6.3%, and we forecast that even larger models would perform better. Therefore, measuring zero-shot learning performance on language models out-of-the-box might underestimate their true potential, and community-wide efforts on aggregating datasets and unifying their formats can help build models that answer prompts better.

pdf bib
Value-Agnostic Conversational Semantic Parsing
Emmanouil Antonios Platanios | Adam Pauls | Subhro Roy | Yuchen Zhang | Alexander Kyte | Alan Guo | Sam Thomson | Jayant Krishnamurthy | Jason Wolfe | Jacob Andreas | Dan Klein
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Conversational semantic parsers map user utterances to executable programs given dialogue histories composed of previous utterances, programs, and system responses. Existing parsers typically condition on rich representations of history that include the complete set of values and computations previously discussed. We propose a model that abstracts over values to focus prediction on type- and function-level context. This approach provides a compact encoding of dialogue histories and predicted programs, improving generalization and computational efficiency. Our model incorporates several other components, including an atomic span copy operation and structural enforcement of well-formedness constraints on predicted programs, that are particularly advantageous in the low-data regime. Trained on the SMCalFlow and TreeDST datasets, our model outperforms prior work by 7.3% and 10.6% respectively in terms of absolute accuracy. Trained on only a thousand examples from each dataset, it outperforms strong baselines by 12.4% and 6.4%. These results indicate that simple representations are key to effective generalization in conversational semantic parsing.

pdf bib
An Improved Model for Voicing Silent Speech
David Gaddy | Dan Klein
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this paper, we present an improved model for voicing silent speech, where audio is synthesized from facial electromyography (EMG) signals. To give our model greater flexibility to learn its own input features, we directly use EMG signals as input in the place of hand-designed features used by prior work. Our model uses convolutional layers to extract features from the signals and Transformer layers to propagate information across longer distances. To provide better signal for learning, we also introduce an auxiliary task of predicting phoneme labels in addition to predicting speech audio features. On an open vocabulary intelligibility evaluation, our model improves the state of the art for this task by an absolute 25.8%.

2020

pdf bib
Task-Oriented Dialogue as Dataflow Synthesis
Jacob Andreas | John Bufe | David Burkett | Charles Chen | Josh Clausman | Jean Crawford | Kate Crim | Jordan DeLoach | Leah Dorner | Jason Eisner | Hao Fang | Alan Guo | David Hall | Kristin Hayes | Kellie Hill | Diana Ho | Wendy Iwaszuk | Smriti Jha | Dan Klein | Jayant Krishnamurthy | Theo Lanman | Percy Liang | Christopher H. Lin | Ilya Lintsbakh | Andy McGovern | Aleksandr Nisnevich | Adam Pauls | Dmitrij Petters | Brent Read | Dan Roth | Subhro Roy | Jesse Rusak | Beth Short | Div Slomin | Ben Snyder | Stephon Striplin | Yu Su | Zachary Tellman | Sam Thomson | Andrei Vorobev | Izabela Witoszko | Jason Wolfe | Abby Wray | Yuchen Zhang | Alexander Zotov
Transactions of the Association for Computational Linguistics, Volume 8

We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs include metacomputation operators for reference and revision that reuse dataflow fragments from previous turns. Our graph-based state enables the expression and manipulation of complex user intents, and explicit metacomputation makes these intents easier for learned models to predict. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people. Experiments show that dataflow graphs and metacomputation substantially improve representability and predictability in these natural dialogues. Additional experiments on the MultiWOZ dataset show that our dataflow representation enables an otherwise off-the-shelf sequence-to-sequence model to match the best existing task-specific state tracking model. The SMCalFlow dataset, code for replicating experiments, and a public leaderboard are available at https://www.microsoft.com/en-us/research/project/dataflow-based-dialogue-semantic-machines.

pdf bib
Semantic Scaffolds for Pseudocode-to-Code Generation
Ruiqi Zhong | Mitchell Stern | Dan Klein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a method for program generation based on semantic scaffolds, lightweight structures representing the high-level semantic and syntactic composition of a program. By first searching over plausible scaffolds then using these as constraints for a beam search over programs, we achieve better coverage of the search space when compared with existing techniques. We apply our hierarchical search method to the SPoC dataset for pseudocode-to-code generation, in which we are given line-level natural language pseudocode annotations and aim to produce a program satisfying execution-based test cases. By using semantic scaffolds during inference, we achieve a 10% absolute improvement in top-100 accuracy over the previous state-of-the-art. Additionally, we require only 11 candidates to reach the top-3000 performance of the previous best approach when tested against unseen problems, demonstrating a substantial improvement in efficiency.

pdf bib
Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference
Nikita Kitaev | Dan Klein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a constituency parsing algorithm that, like a supertagger, works by assigning labels to each word in a sentence. In order to maximally leverage current neural architectures, the model scores each word’s tags in parallel, with minimal task-specific structure. After scoring, a left-to-right reconciliation phase extracts a tree in (empirically) linear time. Our parser achieves 95.4 F1 on the WSJ test set while also achieving substantial speedups compared to current state-of-the-art parsers with comparable accuracies.

pdf bib
Semantic Evaluation for Text-to-SQL with Distilled Test Suites
Ruiqi Zhong | Tao Yu | Dan Klein
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose test suite accuracy to approximate semantic accuracy for Text-to-SQL models. Our method distills a small test suite of databases that achieves high code coverage for the gold query from a large number of randomly generated databases. At evaluation time, it computes the denotation accuracy of the predicted queries on the distilled test suite, hence calculating a tight upper-bound for semantic accuracy efficiently. We use our proposed method to evaluate 21 models submitted to the Spider leader board and manually verify that our method is always correct on 100 examples. In contrast, the current Spider metric leads to a 2.5% false negative rate on average and 8.1% in the worst case, indicating that test suite accuracy is needed. Our implementation, along with distilled test suites for eleven Text-to-SQL datasets, is publicly available.

pdf bib
A Streaming Approach For Efficient Batched Beam Search
Kevin Yang | Violet Yao | John DeNero | Dan Klein
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically “refills” the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method decreases runtime by up to 71% compared to a fixed-width beam search baseline and 17% compared to a variable-width baseline, while matching baselines’ BLEU. Finally, experiments show that our method can speed up decoding in other domains, such as semantic and syntactic parsing.

pdf bib
Unsupervised Parsing via Constituency Tests
Steven Cao | Nikita Kitaev | Dan Klein
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. One type of constituency test involves modifying the sentence via some transformation (e.g. replacing the span with a pronoun) and then judging the result (e.g. checking if it is grammatical). Motivated by this idea, we design an unsupervised parser by specifying a set of transformations and using an unsupervised neural acceptability model to make grammaticality decisions. To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score. While this approach already achieves performance in the range of current methods, we further improve accuracy by fine-tuning the grammaticality model through a refinement procedure, where we alternate between improving the estimated trees and improving the grammaticality model. The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previously best published result.

pdf bib
Digital Voicing of Silent Speech
David Gaddy | Dan Klein
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we consider the task of digitally voicing silent speech, where silently mouthed words are converted to audible speech based on electromyography (EMG) sensor measurements that capture muscle impulses. While prior work has focused on training speech synthesis models from EMG collected during vocalized speech, we are the first to train from EMG collected during silently articulated speech. We introduce a method of training on silent EMG by transferring audio targets from vocalized to silent signals. Our method greatly improves intelligibility of audio generated from silent EMG compared to a baseline that only trains with vocalized data, decreasing transcription word error rate from 64% to 4% in one data condition and 88% to 68% in another. To spur further development on this task, we share our new dataset of silent and vocalized facial EMG measurements.

2019

pdf bib
Cross-Domain Generalization of Neural Constituency Parsers
Daniel Fried | Nikita Kitaev | Dan Klein
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural parsers obtain state-of-the-art results on benchmark treebanks for constituency parsing—but to what degree do they generalize to other domains? We present three results about the generalization of neural parsers in a zero-shot setting: training on trees from one corpus and evaluating on out-of-domain corpora. First, neural and non-neural parsers generalize comparably to new domains. Second, incorporating pre-trained encoder representations into neural parsers substantially improves their performance across all domains, but does not give a larger relative improvement for out-of-domain treebanks. Finally, despite the rich input representations they learn, neural parsers still benefit from structured output prediction of output trees, yielding higher exact match accuracy and stronger generalization both to larger text spans and to out-of-domain corpora. We analyze generalization on English and Chinese corpora, and in the process obtain state-of-the-art parsing results for the Brown, Genia, and English Web treebanks.

pdf bib
Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following
David Gaddy | Dan Klein
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We consider the problem of learning to map from natural language instructions to state transitions (actions) in a data-efficient manner. Our method takes inspiration from the idea that it should be easier to ground language to concepts that have already been formed through pre-linguistic observation. We augment a baseline instruction-following learner with an initial environment-learning phase that uses observations of language-free state transitions to induce a suitable latent representation of actions before processing the instruction-following training data. We show that mapping to pre-learned representations substantially improves performance over systems whose representations are learned from limited instructional data alone.

pdf bib
Multilingual Constituency Parsing with Self-Attention and Pre-Training
Nikita Kitaev | Steven Cao | Dan Klein
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-of-the-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).

pdf bib
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
Ronghang Hu | Daniel Fried | Anna Rohrbach | Dan Klein | Trevor Darrell | Kate Saenko
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Vision-and-Language Navigation (VLN) requires grounding instructions, such as “turn right and stop at the door”, to routes in a visual environment. The actual grounding can connect language to the environment through multiple modalities, e.g. “stop at the door” might ground into visual objects, while “turn right” might rely only on the geometric structure of a route. We investigate where the natural language empirically grounds under two recent state-of-the-art VLN models. Surprisingly, we discover that visual features may actually hurt these models: models which only use route structure, ablating visual features, outperform their visual counterparts in unseen new environments on the benchmark Room-to-Room dataset. To better use all the available modalities, we propose to decompose the grounding procedure into a set of expert models with access to different modalities (including object detections) and ensemble them at prediction time, improving the performance of state-of-the-art models on the VLN task.

pdf bib
Pragmatically Informative Text Generation
Sheng Shen | Daniel Fried | Jacob Andreas | Dan Klein
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We improve the informativeness of models for conditional text generation using techniques from computational pragmatics. These techniques formulate language production as a game between speakers and listeners, in which a speaker should generate output text that a listener can use to correctly identify the original input that the text describes. While such approaches are widely used in cognitive science and grounded language learning, they have received less attention for more standard language generation tasks. We consider two pragmatic modeling methods for text generation: one where pragmatics is imposed by information preservation, and another where pragmatics is imposed by explicit modeling of distractors. We find that these methods improve the performance of strong existing systems for abstractive summarization and generation from structured meaning representations.

pdf bib
A Deep Factorization of Style and Structure in Fonts
Nikita Srivatsan | Jonathan Barron | Dan Klein | Taylor Berg-Kirkpatrick
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a deep factorization model for typographic analysis that disentangles content from style. Specifically, a variational inference procedure factors each training glyph into the combination of a character-specific content embedding and a latent font-specific style variable. The underlying generative model combines these factors through an asymmetric transpose convolutional process to generate the image of the glyph itself. When trained on corpora of fonts, our model learns a manifold over font styles that can be used to analyze or reconstruct new, unseen fonts. On the task of reconstructing missing glyphs from an unknown font given only a small number of observations, our model outperforms both a strong nearest neighbors baseline and a state-of-the-art discriminative model from prior work.

2018

pdf bib
Constituency Parsing with a Self-Attentive Encoder
Nikita Kitaev | Dan Klein
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separating positional and content information in the encoder can lead to improved parsing accuracy. Additionally, we evaluate different approaches for lexical representation. Our parser achieves new state-of-the-art results for single models trained on the Penn Treebank: 93.55 F1 without the use of any external data, and 95.13 F1 when using pre-trained word representations. Our parser also outperforms the previous best-published accuracy figures on 8 of the 9 languages in the SPMRL dataset.

pdf bib
Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing
Daniel Fried | Dan Klein
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Dynamic oracles provide strong supervision for training constituency parsers with exploration, but must be custom defined for a given parser’s transition system. We explore using a policy gradient method as a parser-agnostic alternative. In addition to directly optimizing for a tree-level metric such as F1, policy gradient has the potential to reduce exposure bias by allowing exploration during training; moreover, it does not require a dynamic oracle for supervision. On four constituency parsers in three languages, the method substantially outperforms static oracle likelihood training in almost all settings. For parsers where a dynamic oracle is available (including a novel oracle which we define for the transition system of Dyer et al., 2016), policy gradient typically recaptures a substantial fraction of the performance gain afforded by the dynamic oracle.

pdf bib
What’s Going On in Neural Constituency Parsers? An Analysis
David Gaddy | Mitchell Stern | Dan Klein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

A number of differences have emerged between modern and classic approaches to constituency parsing in recent years, with structural components like grammars and feature-rich lexicons becoming less central while recurrent neural network representations rise in popularity. The goal of this work is to analyze the extent to which information provided directly by the model structure in classical systems is still being captured by neural methods. To this end, we propose a high-performance neural model (92.08 F1 on PTB) that is representative of recent work and perform a series of investigative experiments. We find that our model implicitly learns to encode much of the same information that was explicitly provided by grammars and lexicons in the past, indicating that this scaffolding can largely be subsumed by powerful general-purpose neural machinery.

pdf bib
Unified Pragmatic Models for Generating and Following Instructions
Daniel Fried | Jacob Andreas | Dan Klein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We show that explicit pragmatic inference aids in correctly generating and following natural language instructions for complex, sequential tasks. Our pragmatics-enabled models reason about why speakers produce certain instructions, and about how listeners will react upon hearing them. Like previous pragmatic models, we use learned base listener and speaker models to build a pragmatic speaker that uses the base listener to simulate the interpretation of candidate descriptions, and a pragmatic listener that reasons counterfactually about alternative descriptions. We extend these models to tasks with sequential structure. Evaluation of language generation and interpretation shows that pragmatic inference improves state-of-the-art listener models (at correctly interpreting human instructions) and speaker models (at producing instructions correctly interpreted by humans) in diverse settings.

pdf bib
Learning with Latent Language
Jacob Andreas | Dan Klein | Sergey Levine
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

The named concepts and compositional operators present in natural language provide a rich source of information about the abstractions humans use to navigate the world. Can this linguistic background knowledge improve the generality and efficiency of learned classifiers and control policies? This paper aims to show that using the space of natural language strings as a parameter space is an effective way to capture natural task structure. In a pretraining phase, we learn a language interpretation model that transforms inputs (e.g. images) into outputs (e.g. labels) given natural language descriptions. To learn a new concept (e.g. a classifier), we search directly in the space of descriptions to minimize the interpreter’s loss on training examples. Crucially, our models do not require language data to learn these concepts: language is used only in pretraining to impose structure on subsequent learning. Results on image classification, text editing, and reinforcement learning show that, in all settings, models with a linguistic parameterization outperform those without.

2017

pdf bib
Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space
Nikita Kitaev | Dan Klein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a model for locating regions in space based on natural language descriptions. Starting with a 3D scene and a sentence, our model is able to associate words in the sentence with regions in the scene, interpret relations such as ‘on top of’ or ‘next to,’ and finally locate the region described in the sentence. All components form a single neural network that is trained end-to-end without prior knowledge of object segmentation. To evaluate our model, we construct and release a new dataset consisting of Minecraft scenes with crowdsourced natural language descriptions. We achieve a 32% relative error reduction compared to a strong neural baseline.

pdf bib
Effective Inference for Generative Neural Parsing
Mitchell Stern | Daniel Fried | Dan Klein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Generative neural models have recently achieved state-of-the-art results for constituency parsing. However, without a feasible search procedure, their use has so far been limited to reranking the output of external parsers in which decoding is more tractable. We describe an alternative to the conventional action-level beam search used for discriminative neural models that enables us to decode directly in these generative models. We then show that by improving our basic candidate selection strategy and using a coarse pruning function, we can improve accuracy while exploring significantly less of the search space. Applied to the model of Choe and Charniak (2016), our inference procedure obtains 92.56 F1 on section 23 of the Penn Treebank, surpassing prior state-of-the-art results for single-model systems.

pdf bib
Analogs of Linguistic Structure in Deep Representations
Jacob Andreas | Dan Klein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We investigate the compositional structure of message vectors computed by a deep network trained on a communication game. By comparing truth-conditional representations of encoder-produced message vectors to human-produced referring expressions, we are able to identify aligned (vector, utterance) pairs with the same meaning. We then search for structured relationships among these aligned pairs to discover simple vector space transformations corresponding to negation, conjunction, and disjunction. Our results suggest that neural representations are capable of spontaneously developing a “syntax” with functional analogues to qualitative properties of natural language.

pdf bib
Translating Neuralese
Jacob Andreas | Anca Dragan | Dan Klein
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel. While these policies are effective for many tasks, interpretation of their induced communication strategies has remained a challenge. Here we propose to interpret agents’ messages by translating them. Unlike in typical machine translation problems, we have no parallel data to learn from. Instead we develop a translation model based on the insight that agent messages and natural language strings mean the same thing if they induce the same belief about the world in a listener. We present theoretical guarantees and empirical evidence that our approach preserves both the semantics and pragmatics of messages by ensuring that players communicating through a translation layer do not suffer a substantial loss in reward relative to players with a common language.

pdf bib
A Minimal Span-Based Neural Constituency Parser
Mitchell Stern | Jacob Andreas | Dan Klein
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work, we present a minimal neural model for constituency parsing based on independent scoring of labels and spans. We show that this model is not only compatible with classical dynamic programming techniques, but also admits a novel greedy top-down inference algorithm based on recursive partitioning of the input. We demonstrate empirically that both prediction schemes are competitive with recent work, and when combined with basic extensions to the scoring model are capable of achieving state-of-the-art single-model performance on the Penn Treebank (91.79 F1) and strong performance on the French Treebank (82.23 F1).

pdf bib
Abstract Syntax Networks for Code Generation and Semantic Parsing
Maxim Rabinovich | Mitchell Stern | Dan Klein
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Tasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs. We introduce abstract syntax networks, a modeling framework for these problems. The outputs are represented as abstract syntax trees (ASTs) and constructed by a decoder with a dynamically-determined modular structure paralleling the structure of the output tree. On the benchmark Hearthstone dataset for code generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy, compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with no task-specific engineering.

pdf bib
Improving Neural Parsing by Disentangling Model Combination and Reranking Effects
Daniel Fried | Mitchell Stern | Dan Klein
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recent work has proposed several generative neural models for constituency parsing that achieve state-of-the-art results. Since direct search in these generative models is difficult, they have primarily been used to rescore candidate outputs from base parsers in which decoding is more straightforward. We first present an algorithm for direct search in these generative models. We then demonstrate that the rescoring results are at least partly due to implicit model combination rather than reranking effects. Finally, we show that explicit model combination can improve performance even further, resulting in new state-of-the-art numbers on the PTB of 94.25 F1 when training only on gold data and 94.66 F1 when using external data.

pdf bib
Fine-Grained Entity Typing with High-Multiplicity Assignments
Maxim Rabinovich | Dan Klein
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

As entity type systems become richer and more fine-grained, we expect the number of types assigned to a given entity to increase. However, most fine-grained typing work has focused on datasets that exhibit a low degree of type multiplicity. In this paper, we consider the high-multiplicity regime inherent in data sources such as Wikipedia that have semi-open type systems. We introduce a set-prediction approach to this problem and show that our model outperforms unstructured baselines on a new Wikipedia-based fine-grained typing corpus.

pdf bib
Parsing with Traces: An O(n4) Algorithm and a Structural Representation
Jonathan K. Kummerfeld | Dan Klein
Transactions of the Association for Computational Linguistics, Volume 5

General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3% of the Penn English Treebank. We also implement a proof-of-concept parser that recovers a range of null elements and trace types.

2016

pdf bib
Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints
Greg Durrett | Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks
Matthew Francis-Landau | Greg Durrett | Dan Klein
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Learning to Compose Neural Networks for Question Answering
Jacob Andreas | Marcus Rohrbach | Trevor Darrell | Dan Klein
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reasoning about Pragmatics with Neural Listeners and Speakers
Jacob Andreas | Dan Klein
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
When and why are log-linear models self-normalizing?
Jacob Andreas | Dan Klein
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Disfluency Detection with a Semi-Markov Model and Prosodic Features
James Ferguson | Greg Durrett | Dan Klein
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Code-Switching for Multilingual Historical Document Transcription
Dan Garrette | Hannah Alpert-Abrams | Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
GPU-Friendly Local Regression for Voice Conversion
Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Neural CRF Parsing
Greg Durrett | Dan Klein
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
An Empirical Analysis of Optimization for Max-Margin NLP
Jonathan K. Kummerfeld | Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Alignment-Based Compositional Semantics for Instruction Following
Jacob Andreas | Dan Klein
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Sparser, Better, Faster GPU Parsing
David Hall | Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Less Grammar, More Features
David Hall | Greg Durrett | Dan Klein
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Structured Learning for Taxonomy Induction with Belief Propagation
Mohit Bansal | David Burkett | Gerard de Melo | Dan Klein
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improved Typesetting Models for Historical OCR
Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
How much do word embeddings encode about syntax?
Jacob Andreas | Dan Klein
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
A Joint Model for Entity Analysis: Coreference, Typing, and Linking
Greg Durrett | Dan Klein
Transactions of the Association for Computational Linguistics, Volume 2

We present a joint model of three core tasks in the entity analysis stack: coreference resolution (within-document clustering), named entity recognition (coarse semantic typing), and entity linking (matching to Wikipedia entities). Our model is formally a structured conditional random field. Unary factors encode local features from strong baselines for each task. We then add binary and ternary factors to capture cross-task interactions, such as the constraint that coreferent mentions have the same semantic type. On the ACE 2005 and OntoNotes datasets, we achieve state-of-the-art results for all three tasks. Moreover, joint modeling improves performance on each task over strong independent baselines.

pdf bib
Grounding Language with Points and Paths in Continuous Spaces
Jacob Andreas | Dan Klein
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

2013

pdf bib
Error-Driven Analysis of Challenges in Coreference Resolution
Jonathan K. Kummerfeld | Dan Klein
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Decipherment with a Million Random Restarts
Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Multi-Teraflop Constituency Parser using GPUs
John Canny | David Hall | Dan Klein
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Easy Victories and Uphill Battles in Coreference Resolution
Greg Durrett | Dan Klein
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Dependency-Based Compositional Semantics
Percy Liang | Michael I. Jordan | Dan Klein
Computational Linguistics, Volume 39, Issue 2 - June 2013

pdf bib
Decentralized Entity-Level Modeling for Coreference Resolution
Greg Durrett | David Hall | Dan Klein
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Unsupervised Transcription of Historical Documents
Taylor Berg-Kirkpatrick | Greg Durrett | Dan Klein
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
An Empirical Examination of Challenges in Chinese Parsing
Jonathan K. Kummerfeld | Daniel Tse | James R. Curran | Dan Klein
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Variational Inference for Structured NLP Models
David Burkett | Dan Klein
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)

2012

pdf bib
Fast Inference in Phrase Extraction Models with Belief Propagation
David Burkett | Dan Klein
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Variational Inference for Structured NLP Models
David Burkett | Dan Klein
Tutorial Abstracts at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Syntactic Transfer Using a Bilingual Lexicon
Greg Durrett | Adam Pauls | Dan Klein
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Transforming Trees to Improve Syntactic Convergence
David Burkett | Dan Klein
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
An Empirical Investigation of Statistical Significance in NLP
Taylor Berg-Kirkpatrick | David Burkett | Dan Klein
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
Jonathan K. Kummerfeld | David Hall | James R. Curran | Dan Klein
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Training Factored PCFGs with Expectation Propagation
David Hall | Dan Klein
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Coreference Semantics from Web Features
Mohit Bansal | Dan Klein
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Large-Scale Syntactic Language Modeling with Treelets
Adam Pauls | Dan Klein
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Robust Conversion of CCG Derivations to Phrase Structure Trees
Jonathan K. Kummerfeld | Dan Klein | James R. Curran
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Simple Effective Decipherment via Combinatorial Optimization
Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Large-Scale Cognate Recovery
David Hall | Dan Klein
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Faster and Smaller N-Gram Language Models
Adam Pauls | Dan Klein
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Jointly Learning to Extract and Compress
Taylor Berg-Kirkpatrick | Dan Gillick | Dan Klein
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Learning Dependency-Based Compositional Semantics
Percy Liang | Michael Jordan | Dan Klein
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Web-Scale Features for Full-Scale Parsing
Mohit Bansal | Dan Klein
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
An Empirical Investigation of Discounting in Cross-Domain Language Models
Greg Durrett | Dan Klein
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
The Surprising Variance in Shortest-Derivation Parsing
Mohit Bansal | Dan Klein
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Mention Detection: Heuristics for the OntoNotes annotations
Jonathan K. Kummerfeld | Mohit Bansal | David Burkett | Dan Klein
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

2010

pdf bib
A Game-Theoretic Approach to Generating Spatial Descriptions
Dave Golland | Percy Liang | Dan Klein
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Simple Domain-Independent Probabilistic Approach to Generation
Gabor Angeli | Percy Liang | Dan Klein
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Better Monolingual Models with Unannotated Bilingual Text
David Burkett | Slav Petrov | John Blitzer | Dan Klein
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

pdf bib
Finding Cognate Groups Using Phylogenies
David Hall | Dan Klein
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Simple, Accurate Parsing with an All-Fragments Grammar
Mohit Bansal | Dan Klein
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Phylogenetic Grammar Induction
Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Discriminative Modeling of Extraction Sets for Machine Translation
John DeNero | Dan Klein
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Top-Down K-Best A* Parsing
Adam Pauls | Dan Klein | Chris Quirk
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
An Entity-Level Approach to Information Extraction
Aria Haghighi | Dan Klein
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Hierarchical A* Parsing with Bridge Outside Scores
Adam Pauls | Dan Klein
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Unsupervised Syntactic Alignment with Inversion Transduction Grammars
Adam Pauls | Dan Klein | David Chiang | Kevin Knight
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Joint Parsing and Alignment with Weakly Synchronized Grammars
David Burkett | John Blitzer | Dan Klein
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Coreference Resolution in a Modular, Entity-Centered Model
Aria Haghighi | Dan Klein
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Type-Based MCMC
Percy Liang | Michael I. Jordan | Dan Klein
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Painless Unsupervised Learning with Features
Taylor Berg-Kirkpatrick | Alexandre Bouchard-Côté | John DeNero | Dan Klein
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Improved Reconstruction of Protolanguage Word Forms
Alexandre Bouchard-Côté | Thomas L. Griffiths | Dan Klein
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Efficient Parsing for Transducer Grammars
John DeNero | Mohit Bansal | Adam Pauls | Dan Klein
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Hierarchical Search for Parsing
Adam Pauls | Dan Klein
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Online EM for Unsupervised Models
Percy Liang | Dan Klein
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Learning Semantic Correspondences with Less Supervision
Percy Liang | Michael Jordan | Dan Klein
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Better Word Alignments with Supervised ITG Models
Aria Haghighi | John Blitzer | John DeNero | Dan Klein
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
K-Best A* Parsing
Adam Pauls | Dan Klein
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Asynchronous Binarization for Synchronous Grammars
John DeNero | Adam Pauls | Dan Klein
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Simple Coreference Resolution with Rich Syntactic and Semantic Features
Aria Haghighi | Dan Klein
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Consensus Training for Consensus Decoding in Machine Translation
Adam Pauls | John DeNero | Dan Klein
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Parsing German with Latent Variable Grammars
Slav Petrov | Dan Klein
Proceedings of the Workshop on Parsing German

pdf bib
Coarse-to-Fine Syntactic Machine Translation using Language Projections
Slav Petrov | Aria Haghighi | Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Sampling Alignment Structure under a Bayesian Translation Model
John DeNero | Alexandre Bouchard-Côté | Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing
Slav Petrov | Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Two Languages are Better than One (for Syntactic Parsing)
David Burkett | Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Bilingual Lexicons from Monolingual Corpora
Aria Haghighi | Percy Liang | Taylor Berg-Kirkpatrick | Dan Klein
Proceedings of ACL-08: HLT

pdf bib
Analyzing the Errors of Unsupervised Learning
Percy Liang | Dan Klein
Proceedings of ACL-08: HLT

pdf bib
The Complexity of Phrase Alignment Problems
John DeNero | Dan Klein
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Improved Inference for Unlexicalized Parsing
Slav Petrov | Dan Klein
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Approximate Factoring for A* Search
Aria Haghighi | John DeNero | Dan Klein
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Introduction to Classification: Likelihoods, Margins, Features, and Kernels
Dan Klein
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

pdf bib
Tailoring Word Alignments to Syntactic Machine Translation
John DeNero | Dan Klein
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Unsupervised Coreference Resolution in a Nonparametric Bayesian Model
Aria Haghighi | Dan Klein
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
The Infinite PCFG Using Hierarchical Dirichlet Processes
Percy Liang | Slav Petrov | Michael Jordan | Dan Klein
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
A Probabilistic Approach to Diachronic Phonology
Alexandre Bouchard | Percy Liang | Thomas Griffiths | Dan Klein
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Learning Structured Models for Phone Recognition
Slav Petrov | Adam Pauls | Dan Klein
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)
Lluís Màrquez | Dan Klein
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib
Non-Local Modeling with a Mixture of PCFGs
Slav Petrov | Leon Barrett | Dan Klein
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib
Why Generative Phrase Models Underperform Surface Heuristics
John DeNero | Dan Gillick | James Zhang | Dan Klein
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Learning Accurate, Compact, and Interpretable Tree Annotation
Slav Petrov | Leon Barrett | Romain Thibaux | Dan Klein
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
An End-to-End Discriminative Approach to Machine Translation
Percy Liang | Alexandre Bouchard-Côté | Dan Klein | Ben Taskar
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Prototype-Driven Grammar Induction
Aria Haghighi | Dan Klein
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Alignment by Agreement
Percy Liang | Ben Taskar | Dan Klein
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Word Alignment via Quadratic Assignment
Simon Lacoste-Julien | Ben Taskar | Dan Klein | Michael I. Jordan
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Prototype-Driven Learning for Sequence Models
Aria Haghighi | Dan Klein
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
A Core-Tools Statistical NLP Course
Dan Klein
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

pdf bib
Unsupervised Learning of Field Segmentation Models for Information Extraction
Trond Grenager | Dan Klein | Christopher Manning
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
A Discriminative Matching Approach to Word Alignment
Ben Taskar | Simon Lacoste-Julien | Dan Klein
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Max-Margin Parsing
Ben Taskar | Dan Klein | Michael Collins | Daphne Koller | Christopher Manning
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency
Dan Klein | Christopher Manning
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Named Entity Recognition with Character-Level Models
Dan Klein | Joseph Smarr | Huy Nguyen | Christopher D. Manning
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

pdf bib
A* Parsing: Fast Exact Viterbi Parse Selection
Dan Klein | Christopher D. Manning
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
Kristina Toutanova | Dan Klein | Christopher D. Manning | Yoram Singer
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Optimization, Maxent Models, and Conditional Estimation without Magic
Christopher Manning | Dan Klein
Companion Volume of the Proceedings of HLT-NAACL 2003 - Tutorial Abstracts

pdf bib
Accurate Unlexicalized Parsing
Dan Klein | Christopher D. Manning
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
Combining Heterogeneous Classifiers for Word Sense Disambiguation
Dan Klein | Kristina Toutanova | H. Tolga Ilhan | Sepandar D. Kamvar | Christopher D. Manning
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

pdf bib
Conditional Structure versus Conditional Estimation in NLP Models
Dan Klein | Christopher D. Manning
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
A Generative Constituent-Context Model for Improved Grammar Induction
Dan Klein | Christopher D. Manning
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf bib
Distributional phrase structure induction
Dan Klein | Christopher D. Manning
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

pdf bib
Parsing and Hypergraphs
Dan Klein | Christopher D. Manning
Proceedings of the Seventh International Workshop on Parsing Technologies

pdf bib
Combining Heterogeneous Classifiers for Word-Sense Disambiguation
H. Tolga Ilhan | Sepandar D. Kamvar | Dan Klein | Christopher D. Manning | Kristina Toutanova
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

pdf bib
Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank
Dan Klein | Christopher D. Manning
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

Search
Co-authors