Peng Zhang


2024

pdf bib
A Quantum-Inspired Matching Network with Linguistic Theories for Metaphor Detection
Wenbo Qiao | Peng Zhang | ZengLai Ma
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Enabling machines with the capability to recognize and comprehend metaphors is a crucial step toward achieving artificial intelligence. In linguistic theories, metaphor can be identified through Metaphor Identification Procedure (MIP) or Selectional Preference Violation (SPV), both of which are typically considered as matching tasks in the field of natural language processing. However, the implementation of MIP poses a challenge due to the semantic uncertainty and ambiguity of literal meanings of words. Simultaneously, SPV often struggles to recognize conventional metaphors. Inspired by Quantum Language Model (QLM) for modeling semantic uncertainty and fine-grained feature matching, we propose a quantum-inspired matching network for metaphor detection. Specifically, we use the density matrix to explicitly characterize the literal meanings of the target word for MIP, in order to model the uncertainty and ambiguity of the literal meanings of words. This can make SPV effective even in the face of conventional metaphors. MIP and SPV are then achieved by fine-grained feature matching. The results of the experiment finally demonstrated our approach has strong competitiveness.

pdf bib
Distilling Causal Effect of Data in Continual Few-shot Relation Learning
Weihang Ye | Peng Zhang | Jing Zhang | Hui Gao | Moyao Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Continual Few-Shot Relation Learning (CFRL) aims to learn an increasing number of new relational patterns from a data stream. However, due to the limited number of samples and the continual training mode, this method frequently encounters the catastrophic forgetting issues. The research on causal inference suggests that this issue is caused by the loss of causal effects from old data during the new training process. Inspired by the causal graph, we propose a unified causal framework for CFRL to restore the causal effects. Specifically, we establish two additional causal paths from old data to predictions by having the new data and memory data collide with old data separately in the old feature space. This augmentation allows us to preserve causal effects effectively and enhance the utilization of valuable information within memory data, thereby alleviating the phenomenon of catastrophic forgetting. Furthermore, we introduce a self-adaptive weight to achieve a delicate balance of causal effects between the new and old relation types. Extensive experiments demonstrate the superiority of our method over existing state-of-the-art approaches in CFRL task settings. Our codes are publicly available at: https://github.com/ywh140/CECF.

pdf bib
LA-UCL: LLM-Augmented Unsupervised Contrastive Learning Framework for Few-Shot Text Classification
Jing Zhang | Hui Gao | Peng Zhang | Boda Feng | Wenmin Deng | Yuexian Hou
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The few-shot tasks require the model to have the ability to generalize from a few samples. However, due to the lack of cognitive ability, the current works cannot fully utilize limited samples to expand the sample space and still suffer from overfitting issues. To address the problems, we propose a LLM-Augmented Unsupervised Contrastive Learning Framework (LA-UCL), which introduces a cognition-enabled Large Language Model (LLM) for efficient data augmentation, and presents corresponding contrastive learning strategies. Specifically, in the self-augmented contrastive learning module, we construct a retrieval-based in-context prompt scheme by retrieving similar but different category data from the original samples, guiding the LLM to generate more discriminative augmented data. Then, by designing group-level contrastive loss to enhance the model’s discriminative ability. In the external-augmented contrastive learning module, we utilize web knowledge retrieval to expand the sample space and leverage LLM to generate more diverse data, and introduce sample-level contrastive loss for unlabeled data to improve the model’s generalization. Experimental results on six datasets show that our model exceeds the baseline models.

2023

pdf bib
Domain-specific Attention with Distributional Signatures for Multi-Domain End-to-end Task-Oriented Dialogue
Xing Ma | Peng Zhang | Feifei Zhao
Findings of the Association for Computational Linguistics: ACL 2023

The end-to-end task-oriented dialogue system has achieved great success in recent years. Most of these dialogue systems need to accommodate multi-domain dialogue in real-world scenarios. However, due to the high cost of dialogue data annotation and the scarcity of labeled dialogue data, existing methods are difficult to extend to new domains. Therefore, it is important to use limited data to construct multi-domain dialogue systems. To solve this problem, we propose a novel domain attention module. It use the distributional signatures to construct a multi-domain dialogue system effectively with limited data, which has strong extensibility. We also define a adjacent n-gram pattern to explore potential patterns for dialogue entities. Experimental results show that our approach outperforms the baseline models on most metrics. In the few-shot scenario, we show our method get a great improvement compared with previous methods while keeping smaller model scale.

pdf bib
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Shicheng Tan | Weng Lam Tam | Yuanchun Wang | Wenwen Gong | Shu Zhao | Peng Zhang | Jie Tang
Findings of the Association for Computational Linguistics: ACL 2023

The large scale of pre-trained language models poses a challenge for their deployment on various devices, with a growing emphasis on methods to compress these models, particularly knowledge distillation. However, current knowledge distillation methods rely on the model’s intermediate layer features and the golden labels (also called hard labels), which usually require aligned model architecture and enough labeled data respectively. Moreover, the parameters of vocabulary are usually neglected in existing methods. To address these problems, we propose a general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance. Specifically, GLMD supports more general application scenarios by eliminating the constraints of dimension and structure between models and the need for labeled datasets through the absence of intermediate layers and golden labels. Meanwhile, based on the long-tailed distribution of word frequencies in the data, GLMD designs a strategy of vocabulary compression through decreasing vocabulary size instead of dimensionality. Experimental results show that our method outperforms 25 state-of-the-art methods on the SuperGLUE benchmark, achieving an average score that surpasses the best method by 3%.

pdf bib
LightFormer: Light-weight Transformer Using SVD-based Weight Transfer and Parameter Sharing
Xiuqing Lv | Peng Zhang | Sunzhu Li | Guobing Gan | Yueheng Sun
Findings of the Association for Computational Linguistics: ACL 2023

Transformer has become an important technique for natural language processing tasks with great success. However, it usually requires huge storage space and computational cost, making it difficult to be deployed on resource-constrained edge devices. To compress and accelerate Transformer, we propose LightFormer, which adopts a low-rank factorization initialized by SVD-based weight transfer and parameter sharing. The SVD-based weight transfer can effectively utilize the well-trained Transformer parameter knowledge to speed up the model convergence, and effectively alleviate the low-rank bottleneck problem combined with parameter sharing. We validate our method on machine translation, text summarization and text classification tasks. Experiments show that on IWSLT’14 De-En and WMT’14 En-De, LightFormer achieves similar performance to the baseline Transformer with 3.8 times and 1.8 times fewer parameters, and achieves 2.3 times speedup and 1.5 times speedup respectively, generally outperforming recent light-weight Transformers.

pdf bib
An Intent-based and Annotation-free Method for Duplicate Question Detection in CQA Forums
Yubo Shu | Hansu Gu | Peng Zhang | Tun Lu | Ning Gu
Findings of the Association for Computational Linguistics: EMNLP 2023

With the advent of large language models (LLMs), Community Question Answering (CQA) forums offer well-curated questions and answers that can be utilized for instruction-tuning, effectively training LLMs to be aligned with human intents. However, the issue of duplicate questions arises as the volume of content within CQA continues to grow, posing a threat to content quality. Recent research highlights the benefits of detecting and eliminating duplicate content. It not only enhances the LLMs’ ability to generalize across diverse intents but also improves the efficiency of training data utilization while addressing concerns related to information leakage. However, existing methods for detecting duplicate questions in CQA typically rely on generic text-pair matching models, overlooking the intent behind the questions. In this paper, we propose a novel intent-based duplication detector named Intent-DQD that comprehensively leverages intent information to address the problem of duplicate question detection in CQA. Intent-DQD first leverages the characteristics in CQA forums and extracts training labels to recognize and match intents without human annotation. Intent-DQD then effectively aggregates intent-level relations and establishes question-level relations to enable intent-aware duplication detection. Experimental results on fifteen distinct domains from both CQADupStack and Stack Overflow datasets demonstrate the effectiveness of Intent-DQD. Reproducible codes and datasets will be released upon publication of the paper.

pdf bib
Rumor Detection on Social Media with Crowd Intelligence and ChatGPT-Assisted Networks
Chang Yang | Peng Zhang | Wenbo Qiao | Hui Gao | Jiaming Zhao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In the era of widespread dissemination through social media, the task of rumor detection plays a pivotal role in establishing a trustworthy and reliable information environment. Nonetheless, existing research on rumor detection confronts several challenges: the limited expressive power of text encoding sequences, difficulties in domain knowledge coverage and effective information extraction with knowledge graph-based methods, and insufficient mining of semantic structural information. To address these issues, we propose a Crowd Intelligence and ChatGPT-Assisted Network(CICAN) for rumor classification. Specifically, we present a crowd intelligence-based semantic feature learning module to capture textual content’s sequential and hierarchical features. Then, we design a knowledge-based semantic structural mining module that leverages ChatGPT for knowledge enhancement. Finally, we construct an entity-sentence heterogeneous graph and design Entity-Aware Heterogeneous Attention to effectively integrate diverse structural information meta-paths. Experimental results demonstrate that CICAN achieves performance improvement in rumor detection tasks, validating the effectiveness and rationality of using large language models as auxiliary tools.

pdf bib
XDailyDialog: A Multilingual Parallel Dialogue Corpus
Zeming Liu | Ping Nie | Jie Cai | Haifeng Wang | Zheng-Yu Niu | Peng Zhang | Mrinmaya Sachan | Kaiping Peng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

High-quality datasets are significant to the development of dialogue models. However, most existing datasets for open-domain dialogue modeling are limited to a single language. The absence of multilingual open-domain dialog datasets not only limits the research on multilingual or cross-lingual transfer learning, but also hinders the development of robust open-domain dialog systems that can be deployed in other parts of the world. In this paper, we provide a multilingual parallel open-domain dialog dataset, XDailyDialog, to enable researchers to explore the challenging task of multilingual and cross-lingual open-domain dialog. XDailyDialog includes 13K dialogues aligned across 4 languages (52K dialogues and 410K utterances in total). We then propose a dialog generation model, kNN-Chat, which has a novel kNN-search mechanism to support unified response retrieval for monolingual, multilingual, and cross-lingual dialogue. Experiment results show the effectiveness of this framework. We will make XDailyDialog and kNN-Chat publicly available soon.

pdf bib
VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering
Zijun Yao | Yuanyong Chen | Xin Lv | Shulin Cao | Amy Xin | Jifan Yu | Hailong Jin | Jianjun Xu | Peng Zhang | Lei Hou | Juanzi Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answering (KBQA) system that integrates human into the loop to edit and debug the knowledge base (KB) queries. VisKoP not only provides a neural program induction module, which converts natural language questions into knowledge oriented program language (KoPL), but also maps KoPL programs into graphical elements. KoPL programs can be edited with simple graphical operators, such as ”dragging” to add knowledge operators and ”slot filling” to designate operator arguments. Moreover, VisKoP provides auto-completion for its knowledge base schema and users can easily debug the KoPL program by checking its intermediate results. To facilitate the practical KBQA on a million-entity-level KB, we design a highly efficient KoPL execution engine for the back-end. Experiment results show that VisKoP is highly efficient and user interaction can fix a large portion of wrong KoPL programs to acquire the correct answer. The VisKoP online demo, highly efficient KoPL engine, and screencast video are now publicly available.

pdf bib
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan | Weng Lam Tam | Yuanchun Wang | Wenwen Gong | Shu Zhao | Peng Zhang | Jie Tang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices. However, the deployment of knowledge distillation systems faces great challenges in real-world industrial-strength applications, which require the use of complex distillation methods on even larger-scale PLMs (over 10B), limited by memory on GPUs and the switching of methods. To overcome these challenges, we propose GKD, a general knowledge distillation framework that supports distillation on larger-scale PLMs using various distillation methods. With GKD, developers can build larger distillation models on memory-limited GPUs and easily switch and combine different distillation methods within a single framework. Experimental results show that GKD can support the distillation of at least 100B-scale PLMs and 25 mainstream methods on 8 NVIDIA A100 (40GB) GPUs.

2022

pdf bib
Hypoformer: Hybrid Decomposition Transformer for Edge-friendly Neural Machine Translation
Sunzhu Li | Peng Zhang | Guobing Gan | Xiuqing Lv | Benyou Wang | Junqiu Wei | Xin Jiang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Transformer has been demonstrated effective in Neural Machine Translation (NMT). However, it is memory-consuming and time-consuming in edge devices, resulting in some difficulties for real-time feedback. To compress and accelerate Transformer, we propose a Hybrid Tensor-Train (HTT) decomposition, which retains full rank and meanwhile reduces operations and parameters. A Transformer using HTT, named Hypoformer, consistently and notably outperforms the recent light-weight SOTA methods on three standard translation tasks under different parameter and speed scales. In extreme low resource scenarios, Hypoformer has 7.1 points absolute improvement in BLEU and 1.27 X speedup than vanilla Transformer on IWSLT’14 De-En task.

pdf bib
A Sequential Flow Control Framework for Multi-hop Knowledge Base Question Answering
Minghui Xie | Chuzhan Hao | Peng Zhang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

One of the key challenges of knowledge base question answering (KBQA) is the multi-hop reasoning. Since in different hops, one attends to different parts of question, it is important to dynamically represent the question semantics for each hop. Existing methods, however, (i) infer the dynamic question representation only through coarse-grained attention mechanisms, which may bring information loss, (ii) and have not effectively modeled the sequential logic, which is crucial for the multi-hop reasoning process in KBQA.To address these issues, we propose a sequential reasoning self-attention mechanism to capture the crucial reasoning information of each single hop in a more fine-grained way. Based on Gated Recurrent Unit (GRU) which is good at modeling sequential process, we propose a simple but effective GRU-inspired Flow Control (GFC) framework to model sequential logic in the whole multi-hop process.Extensive experiments on three popular benchmark datasets have demonstrated the superior effectiveness of our model. In particular, GFC achieves new state-of-the-art Hits@1 of 76.8% on WebQSP and is also effective when KB is incomplete. Our code and data are available at https://github.com/Xie-Minghui/GFC.

pdf bib
ACENet: Attention Guided Commonsense Reasoning on Hybrid Knowledge Graph
Chuzhan Hao | Minghui Xie | Peng Zhang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Augmenting pre-trained language models (PLMs) with knowledge graphs (KGs) has demonstrated superior performance on commonsense reasoning. Given a commonsense based QA context (question and multiple choices), existing approaches usually estimate the plausibility of candidate choices separately based on their respective retrieved KGs, without considering the interference among different choices. In this paper, we propose an Attention guided Commonsense rEasoning Network (ACENet) to endow the neural network with the capability of integrating hybrid knowledge. Specifically, our model applies the multi-layer interaction of answer choices to continually strengthen correct choice information and guide the message passing of GNN. In addition, we also design a mix attention mechanism of nodes and edges to iteratively select supporting evidence on hybrid knowledge graph. Experimental results demonstrate the effectiveness of our proposed model through considerable performance gains across CommonsenseQA and OpenbookQA datasets.

pdf bib
QaDialMoE: Question-answering Dialogue based Fact Verification with Mixture of Experts
Longzheng Wang | Peng Zhang | Xiaoyu Lu | Lei Zhang | Chaoyang Yan | Chuang Zhang
Findings of the Association for Computational Linguistics: EMNLP 2022

Fact verification is an essential tool to mitigate the spread of false information online, which has gained a widespread attention recently. However, a fact verification in the question-answering dialogue is still underexplored. In this paper, we propose a neural network based approach called question-answering dialogue based fact verification with mixture of experts (QaDialMoE). It exploits questions and evidence effectively in the verification process and can significantly improve the performance of fact verification. Specifically, we exploit the mixture of experts to focus on various interactions among responses, questions and evidence. A manager with an attention guidance module is implemented to guide the training of experts and assign a reasonable attention score to each expert. A prompt module is developed to generate synthetic questions that make our approach more generalizable. Finally, we evaluate the QaDialMoE and conduct a comparative study on three benchmark datasets. The experimental results demonstrate that our QaDialMoE outperforms previous approaches by a large margin and achieves new state-of-the-art results on all benchmarks. This includes the accuracy improvements on the HEALTHVER as 84.26%, the FAVIQ A dev set as 78.7%, the FAVIQ R dev set as 86.1%, test set as 86.0%, and the COLLOQUIAL as 89.5%. To our best knowledge, this is the first work to investigate a question-answering dialogue based fact verification, and achieves new state-of-the-art results on various benchmark datasets.

pdf bib
ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer
Ningning Wang | Guobing Gan | Peng Zhang | Shuai Zhang | Junqiu Wei | Qun Liu | Xin Jiang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recently, a lot of research has been carried out to improve the efficiency of Transformer. Among them, the sparse pattern-based method is an important branch of efficient Transformers. However, some existing sparse methods usually use fixed patterns to select words, without considering similarities between words. Other sparse methods use clustering patterns to select words, but the clustering process is separate from the training process of the target task, which causes a decrease in effectiveness. To address these limitations, we design a neural clustering method, which can be seamlessly integrated into the Self-Attention Mechanism in Transformer. The clustering task and the target task are jointly trained and optimized to benefit each other, leading to significant effectiveness improvement. In addition, our method groups the words with strong dependencies into the same cluster and performs the attention mechanism for each cluster independently, which improves the efficiency. We verified our method on machine translation, text classification, natural language inference, and text matching tasks. Experimental results show that our method outperforms two typical sparse attention methods, Reformer and Routing Transformer while having a comparable or even better time and memory efficiency.

pdf bib
HOSMEL: A Hot-Swappable Modularized Entity Linking Toolkit for Chinese
Daniel Zhang-li | Jing Zhang | Jifan Yu | Xiaokang Zhang | Peng Zhang | Jie Tang | Juanzi Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We investigate the usage of entity linking (EL)in downstream tasks and present the first modularized EL toolkit for easy task adaptation. Different from the existing EL methods that dealwith all the features simultaneously, we modularize the whole model into separate parts witheach feature. This decoupled design enablesflexibly adding new features without retraining the whole model as well as flow visualization with better interpretability of the ELresult. We release the corresponding toolkit,HOSMEL, for Chinese, with three flexible usage modes, a live demo, and a demonstrationvideo. Experiments on two benchmarks forthe question answering task demonstrate thatHOSMEL achieves much less time and spaceconsumption as well as significantly better accuracy performance compared with existingSOTA EL methods. We hope the release ofHOSMEL will call for more attention to studyEL for downstream tasks in non-English languages.

pdf bib
DoSEA: A Domain-specific Entity-aware Framework for Cross-Domain Named Entity Recogition
Minghao Tang | Peng Zhang | Yongquan He | Yongxiu Xu | Chengpeng Chao | Hongbo Xu
Proceedings of the 29th International Conference on Computational Linguistics

Cross-domain named entity recognition aims to improve performance in a target domain with shared knowledge from a well-studied source domain. The previous sequence-labeling based method focuses on promoting model parameter sharing among domains. However, such a paradigm essentially ignores the domain-specific information and suffers from entity type conflicts. To address these issues, we propose a novel machine reading comprehension based framework, named DoSEA, which can identify domain-specific semantic differences and mitigate the subtype conflicts between domains. Concretely, we introduce an entity existence discrimination task and an entity-aware training setting, to recognize inconsistent entity annotations in the source domain and bring additional reference to better share information across domains. Experiments on six datasets prove the effectiveness of our DoSEA. Our source code can be obtained from https://github.com/mhtang1995/DoSEA.

2021

pdf bib
Natural Language Processing Meets Quantum Physics: A Survey and Categorization
Sixuan Wu | Jian Li | Peng Zhang | Yue Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent research has investigated quantum NLP, designing algorithms that process natural language in quantum computers, and also quantum-inspired algorithms that improve NLP performance on classical computers. In this survey, we review representative methods at the intersection of NLP and quantum physics in the past ten years, categorizing them according to the use of quantum theory, the linguistic targets that are modeled, and the downstream application. The literature review ends with a discussion on the key factors to the success that has been achieved by existing work, as well as challenges ahead, with the goal of better understanding the promises and further directions.

2019

pdf bib
CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots
Arshit Gupta | Peng Zhang | Garima Lalwani | Mona Diab
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Natural Language Understanding (NLU) is a core component of dialog systems. It typically involves two tasks - Intent Classification (IC) and Slot Labeling (SL), which are then followed by a dialogue management (DM) component. Such NLU systems cater to utterances in isolation, thus pushing the problem of context management to DM. However, contextual information is critical to the correct prediction of intents in a conversation. Prior work on contextual NLU has been limited in terms of the types of contextual signals used and the understanding of their impact on the model. In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals over a variable context window, such as previous intents, slots, dialog acts and utterances, in addition to the current user utterance. CASA-NLU outperforms a recurrent contextual NLU baseline on two conversational datasets, yielding a gain of up to 7% on the IC task. Moreover, a non-contextual variant of CASA-NLU achieves state-of-the-art performance on standard public datasets - SNIPS and ATIS.

2015

pdf bib
Reinforcing the Topic of Embeddings with Theta Pure Dependence for Text Classification
Ning Xing | Yuexian Hou | Peng Zhang | Wenjie Li | Dawei Song
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Event-Based Hyperspace Analogue to Language for Query Expansion
Tingxu Yan | Tamsin Maxwell | Dawei Song | Yuexian Hou | Peng Zhang
Proceedings of the ACL 2010 Conference Short Papers

2008

pdf bib
Exploiting the Role of Position Feature in Chinese Relation Extraction
Peng Zhang | Wenjie Li | Furu Wei | Qin Lu | Yuexian Hou
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Relation extraction is the task of finding pre-defined semantic relations between two entities or entity mentions from text. Many methods, such as feature-based and kernel-based methods, have been proposed in the literature. Among them, feature-based methods draw much attention from researchers. However, to the best of our knowledge, existing feature-based methods did not explicitly incorporate the position feature and no in-depth analysis was conducted in this regard. In this paper, we define and exploit nine types of position information between two named entity mentions and then use it along with other features in a multi-class classification framework for Chinese relation extraction. Experiments on the ACE 2005 data set show that the position feature is more effective than the other recognized features like entity type/subtype and character-based N-gram context. Most important, it can be easily captured and does not require as much effort as applying deep natural language processing.

pdf bib
A Novel Feature-based Approach to Chinese Entity Relation Extraction
Wenjie Li | Peng Zhang | Furu Wei | Yuexian Hou | Qin Lu
Proceedings of ACL-08: HLT, Short Papers