Ying Chen


2023

pdf bib
Word-level Prefix/Suffix Sense Detection: A Case Study on Negation Sense with Few-shot Learning
Yameng Li | Zicheng Li | Ying Chen | Shoushan Li
Findings of the Association for Computational Linguistics: ACL 2023

Morphological analysis is an important research issue in the field of natural language processing. In this study, we propose a context-free morphological analysis task, namely word-level prefix/suffix sense detection, which deals with the ambiguity of sense expressed by prefix/suffix. To research this novel task, we first annotate a corpus with prefixes/suffixes expressing negation (e.g., il-, un-, -less) and then propose a novel few-shot learning approach that applies an input-augmentation prompt to a token-replaced detection pre-training model. Empirical studies demonstrate the effectiveness of the proposed approach to word-level prefix/suffix negation sense detection.

2022

pdf bib
DuReader-Retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine
Yifu Qiu | Hongyu Li | Yingqi Qu | Ying Chen | QiaoQiao She | Jing Liu | Hua Wu | Haifeng Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this paper, we present DuReader-retrieval, a large-scale Chinese dataset for passage retrieval. DuReader-retrieval contains more than 90K queries and over 8M unique passages from a commercial search engine. To alleviate the shortcomings of other datasets and ensure the quality of our benchmark, we (1) reduce the false negatives in development and test sets by manually annotating results pooled from multiple retrievers, and (2) remove the training queries that are semantically similar to the development and testing queries. Additionally, we provide two out-of-domain testing sets for cross-domain evaluation, as well as a set of human translated queries for for cross-lingual retrieval evaluation. The experiments demonstrate that DuReader-retrieval is challenging and a number of problems remain unsolved, such as the salient phrase mismatch and the syntactic mismatch between queries and paragraphs. These experiments also show that dense retrievers do not generalize well across domains, and cross-lingual retrieval is essentially challenging. DuReader-retrieval is publicly available at https://github.com/baidu/DuReader/tree/master/DuReader-Retrieval.

pdf bib
DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models
Hongyu Zhu | Yan Chen | Jing Yan | Jing Liu | Yu Hong | Ying Chen | Hua Wu | Haifeng Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this paper, we focus on the robustness evaluation of Chinese Question Matching (QM) models. Most of the previous work on analyzing robustness issues focus on just one or a few types of artificial adversarial examples. Instead, we argue that a comprehensive evaluation should be conducted on natural texts, which takes into account the fine-grained linguistic capabilities of QM models. For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of QM models. DuQM contains 3 categories and 13 subcategories with 32 linguistic perturbations. The extensive experiments demonstrate that DuQM has a better ability to distinguish different models. Importantly, the detailed breakdown of evaluation by the linguistic phenomena in DuQM helps us easily diagnose the strength and weakness of different models. Additionally, our experiment results show that the effect of artificial adversarial examples does not work on natural texts. Our baseline codes and a leaderboard are now publicly available.

pdf bib
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
Lijie Wang | Yaozong Shen | Shuyuan Peng | Shuai Zhang | Xinyan Xiao | Hao Liu | Hongxuan Tang | Ying Chen | Hua Wu | Haifeng Wang
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to precisely evaluate the interpretability, we provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We also design a new metric, i.e., the consistency between the rationales before and after perturbations, to uniformly evaluate the interpretability on different types of tasks. Based on this benchmark, we conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability. We will release this benchmark (https://www.luge.ai/#/luge/task/taskDetail?taskId=15) and hope it can facilitate the research in building trustworthy systems.

2021

pdf bib
BSTC: A Large-Scale Chinese-English Speech Translation Dataset
Ruiqing Zhang | Xiyang Wang | Chuanqiang Zhang | Zhongjun He | Hua Wu | Zhi Li | Haifeng Wang | Ying Chen | Qinfei Li
Proceedings of the Second Workshop on Automatic Simultaneous Translation

This paper presents BSTC (Baidu Speech Translation Corpus), a large-scale Chinese-English speech translation dataset. This dataset is constructed based on a collection of licensed videos of talks or lectures, including about 68 hours of Mandarin data, their manual transcripts and translations into English, as well as automated transcripts by an automatic speech recognition (ASR) model. We have further asked three experienced interpreters to simultaneously interpret the testing talks in a mock conference setting. This corpus is expected to promote the research of automatic simultaneous translation as well as the development of practical systems. We have organized simultaneous translation tasks and used this corpus to evaluate automatic simultaneous translation systems.

2020

pdf bib
End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network
Ying Chen | Wenjun Hou | Shoushan Li | Caicong Wu | Xiaoqiang Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Emotion-cause pair extraction (ECPE), which aims at simultaneously extracting emotion-cause pairs that express emotions and their corresponding causes in a document, plays a vital role in understanding natural languages. Considering that most emotions usually have few causes mentioned in their contexts, we present a novel end-to-end Pair Graph Convolutional Network (PairGCN) to model pair-level contexts so that to capture the dependency information among local neighborhood candidate pairs. Moreover, in the graphical network, contexts are grouped into three types and each type of contexts is propagated by its own way. Experiments on a benchmark Chinese emotion-cause pair extraction corpus demonstrate the effectiveness of the proposed model.

2019

pdf bib
CAUnLP at NLP4IF 2019 Shared Task: Context-Dependent BERT for Sentence-Level Propaganda Detection
Wenjun Hou | Ying Chen
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

The goal of fine-grained propaganda detection is to determine whether a given sentence uses propaganda techniques (sentence-level) or to recognize which techniques are used (fragment-level). This paper presents the sys- tem of our participation in the sentence-level subtask of the propaganda detection shared task. In order to better utilize the document information, we construct context-dependent input pairs (sentence-title pair and sentence- context pair) to fine-tune the pretrained BERT, and we also use the undersampling method to tackle the problem of imbalanced data.

2018

pdf bib
Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network
Xiangyang Zhou | Lu Li | Daxiang Dong | Yi Liu | Ying Chen | Wayne Xin Zhao | Dianhai Yu | Hua Wu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Human generates responses relying on semantic and functional dependencies, including coreference relation, among dialogue elements and their context. In this paper, we investigate matching a response with its multi-turn context using dependency information based entirely on attention. Our solution is inspired by the recently proposed Transformer in machine translation (Vaswani et al., 2017) and we extend the attention mechanism in two ways. First, we construct representations of text segments at different granularities solely with stacked self-attention. Second, we try to extract the truly matched segment pairs with attention across the context and response. We jointly introduce those two kinds of attention in one uniform neural network. Experiments on two large-scale multi-turn response selection tasks show that our proposed model significantly outperforms the state-of-the-art models.

pdf bib
Joint Learning for Emotion Classification and Emotion Cause Detection
Ying Chen | Wenjun Hou | Xiyao Cheng | Shoushan Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a neural network-based joint approach for emotion classification and emotion cause detection, which attempts to capture mutual benefits across the two sub-tasks of emotion analysis. Considering that emotion classification and emotion cause detection need different kinds of features (affective and event-based separately), we propose a joint encoder which uses a unified framework to extract features for both sub-tasks and a joint model trainer which simultaneously learns two models for the two sub-tasks separately. Our experiments on Chinese microblogs show that the joint approach is very promising.

2016

pdf bib
Corpus Fusion for Emotion Classification
Suyang Zhu | Shoushan Li | Ying Chen | Guodong Zhou
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Machine learning-based methods have obtained great progress on emotion classification. However, in most previous studies, the models are learned based on a single corpus which often suffers from insufficient labeled data. In this paper, we propose a corpus fusion approach to address emotion classification across two corpora which use different emotion taxonomies. The objective of this approach is to utilize the annotated data from one corpus to help the emotion classification on another corpus. An Integer Linear Programming (ILP) optimization is proposed to refine the classification results. Empirical studies show the effectiveness of the proposed approach to corpus fusion for emotion classification.

2010

pdf bib
A Text-driven Rule-based System for Emotion Cause Detection
Sophia Yat Mei Lee | Ying Chen | Chu-Ren Huang
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text

pdf bib
Textual Emotion Processing From Event Analysis
Chu-Ren Huang | Ying Chen | Sophia Yat Mei Lee
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
The Chinese Persons Name Diambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News
Ying Chen | Peng Jin | Wenjie Li | Chu-Ren Huang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Emotion Cause Events: Corpus Construction and Analysis
Sophia Yat Mei Lee | Ying Chen | Shoushan Li | Chu-Ren Huang
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Emotion processing has always been a great challenge. Given the fact that an emotion is triggered by cause events and that cause events are an integral part of emotion, this paper constructs a Chinese emotion cause corpus as a first step towards automatic inference of cause-emotion correlation. The corpus focuses on five primary emotions, namely happiness, sadness, fear, anger, and surprise. It is annotated with emotion cause events based on our proposed annotation scheme. Corpus data shows that most emotions are expressed with causes, and that causes mostly occur before the corresponding emotion verbs. We also examine the correlations between emotions and cause events in terms of linguistic cues: causative verbs, perception verbs, epistemic markers, conjunctions, prepositions, and others. Results show that each group of linguistic cues serves as an indicator marking the cause events in different structures of emotional constructions. We believe that the emotion cause corpus will be the useful resource for automatic emotion cause detection as well as emotion detection and classification.

pdf bib
Emotion Cause Detection with Linguistic Constructions
Ying Chen | Sophia Yat Mei Lee | Shoushan Li | Chu-Ren Huang
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Sentiment Classification and Polarity Shifting
Shoushan Li | Sophia Y. M. Lee | Ying Chen | Chu-Ren Huang | Guodong Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
A Cognitive-based Annotation System for Emotion Computing
Ying Chen | Sophia Y. M. Lee | Chu-Ren Huang
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
An Integrated Approach to Heterogeneous Data for Information Extraction
Ying Chen | Sophia Y. M. Lee | Chu-Ren Huang
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Are Emotions Enumerable or Decomposable? And its Implications for Emotion Processing
Ying Chen | Sophia Y. M. Lee | Chu-Ren Huang
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Cause Event Representations for Happiness and Surprise
Sophia Yat Mei Lee | Ying Chen | Chu-Ren Huang
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

2007

pdf bib
CU-COMSEM: Exploring Rich Features for Unsupervised Web Personal Name Disambiguation
Ying Chen | James H. Martin
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Towards Robust Unsupervised Personal Name Disambiguation
Ying Chen | James Martin
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2005

pdf bib
Detection of Entity Mentions Occuring in English and Chinese Text
Kadri Hacioglu | Benjamin Douglas | Ying Chen
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing