2024
pdf
bib
abs
Rethinking Machine Ethics – Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Jingyan Zhou
|
Minda Hu
|
Junan Li
|
Xiaoying Zhang
|
Xixin Wu
|
Irwin King
|
Helen Meng
Findings of the Association for Computational Linguistics: NAACL 2024
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. These approaches have been criticized for potentially overgeneralizing a limited group of annotators’ moral stances and lacking explainability. This work proposes a flexible top-down framework to steer (Large) Language Models to perform moral reasoning with well-established moral theories from interdisciplinary research. The theory-guided top-down framework can incorporate various moral theories. Our experiments demonstrate the effectiveness of the proposed framework on datasets derived from moral theories. Furthermore, we show the alignment between different moral theories and existing morality datasets. Our analysis exhibits the potential and flaws in existing resources (models and datasets) in developing explainable moral judgment-making systems.
pdf
bib
abs
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Tianhua Zhang
|
Jiaxin Ge
|
Hongyin Luo
|
Yung-Sung Chuang
|
Mingye Gao
|
Yuan Gong
|
Yoon Kim
|
Xixin Wu
|
Helen Meng
|
James Glass
Findings of the Association for Computational Linguistics: NAACL 2024
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output. Despite using a task-general prompt, we find that this approach can improve upon strong baselines across a range of different tasks including math and symbolic reasoning, text classification, question answering, and instruction following. We found that the generated programs are interpretable since they outline the exact reasoning process followed by the program interpreter.
pdf
bib
abs
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang
|
Baolin Peng
|
Ye Tian
|
Jingyan Zhou
|
Lifeng Jin
|
Linfeng Song
|
Haitao Mi
|
Helen Meng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite showing impressive abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e., ”hallucinations”, even when they hold relevant knowledge. To mitigate these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM’s self-evaluation ability by improving the model’s confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.
pdf
bib
abs
COKE: A Cognitive Knowledge Graph for Machine Theory of Mind
Jincenzi Wu
|
Zhuang Chen
|
Jiawen Deng
|
Sahand Sabour
|
Helen Meng
|
Minlie Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Theory of mind (ToM) refers to humans’ ability to understand and infer the desires, beliefs, and intentions of others. The acquisition of ToM plays a key role in humans’ social cognition and interpersonal relations. Though indispensable for social intelligence, ToM is still lacking for modern AI and NLP systems since they cannot access the human mental state and cognitive process beneath the training corpus. To empower AI systems with the ToM ability and narrow the gap between them and humans, in this paper, we propose COKE: the first cognitive knowledge graph for machine theory of mind. Specifically, COKE formalizes ToM as a collection of 45k+ manually verified cognitive chains that characterize human mental activities and subsequent behavioral/affective responses when facing specific social circumstances. In addition, we further generalize COKE using LLMs and build a powerful generation model COLM tailored for cognitive reasoning. Experimental results in both automatic and human evaluation demonstrate the high quality of COKE, the superior ToM ability of COLM, and its potential to significantly enhance social applications.
2023
pdf
bib
abs
Search Augmented Instruction Learning
Hongyin Luo
|
Tianhua Zhang
|
Yung-Sung Chuang
|
Yuan Gong
|
Yoon Kim
|
Xixin Wu
|
Helen Meng
|
James Glass
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information. In this work, we propose search-augmented instruction learning (SAIL), which grounds the language generation and instruction following abilities on complex search results generated by in-house and external search engines. With an instruction tuning corpus, we collect search results for each training case from different search APIs and domains, and construct a new search-grounded training set containing (instruction, grounding information, response) triplets. We then fine-tune the LLaMA-7B model on the constructed training set. Since the collected results contain unrelated and disputing languages, the model needs to learn to ground on trustworthy search results, filter out distracting passages, and generate the target response. The search result-denoising process entails explicit trustworthy information selection and multi-hop reasoning, since the retrieved passages might be informative but not contain the instruction-following answer. Experiments show that the fine-tuned SAIL-7B model has a strong instruction-following ability, and it performs significantly better on transparency-sensitive tasks, including open-ended question answering and fact checking.
pdf
bib
abs
SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting
Xiaoying Zhang
|
Baolin Peng
|
Kun Li
|
Jingyan Zhou
|
Helen Meng
Findings of the Association for Computational Linguistics: EMNLP 2023
Building and maintaining end-to-end task bots using minimal human effort is a long-standing challenge in dialog research. In this work, we introduce SGP-TOD, Schema-Guided Prompting for building Task-Oriented Dialog systems effortlessly based on large language models (LLMs). Utilizing the predefined task schema, i.e., belief instruction and dialog policy, we instruct fixed LLMs to generate appropriate responses on novel tasks, without the need for training data. Specifically, SGP-TOD comprises three components: an LLM for interacting with users, a Dialog State Tracking (DST) Prompter to aid the LLM in tracking dialog states with the given belief instruction, and a Policy Prompter to direct the LLM to generate proper responses adhering to the provided dialog policy. Experimental results on Multiwoz, RADDLE, and STAR datasets show that our training-free strategy, SGP-TOD, yields state-of-the-art (SOTA) zero-shot performance, significantly surpassing the few-shot approaches. In a domain-extension setting, SGP-TOD aptly adapts to new functionalities by merely adding supplementary schema rules. We make our code and data publicly available.
pdf
bib
abs
ConvRGX: Recognition, Generation, and Extraction for Self-trained Conversational Question Answering
Tianhua Zhang
|
Liping Tang
|
Wei Fang
|
Hongyin Luo
|
Xixin Wu
|
Helen Meng
|
James Glass
Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Collecting and constructing human-annotated corpora for training conversational question-answering (CQA) models has recently been shown to be inefficient and costly. To solve this problem, previous works have proposed training QA models with automatically generated QA data. In this work, we extend earlier studies on QA synthesis, and propose an efficient QA data generation algorithm under conversational settings. Our model recognizes potential dialogue topics, generates corresponding questions, and extracts answers from grounding passages. To improve the quality of generated QAs and downstream self-training of CQA models, we propose dropout and agreement-based QA selection methods. We conduct experiments on both data augmentation and domain adaptation settings. Experiments on the QuAC and Doc2Dial tasks show that the proposed method can significantly improve the quality of generated QA data, and also improves the accuracy of self-trained CQA models based on the constructed training corpora.
2022
pdf
bib
abs
Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout
Kun Li
|
Tianhua Zhang
|
Liping Tang
|
Junan Li
|
Hongyuan Lu
|
Xixin Wu
|
Helen Meng
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
MultiDoc2Dial presents an important challenge on modeling dialogues grounded with multiple documents. This paper proposes a pipeline system of “retrieve, re-rank, and generate”, where each component is individually optimized. This enables the passage re-ranker and response generator to fully exploit training with ground-truth data. Furthermore, we use a deep cross-encoder trained with localized hard negative passages from the retriever. For the response generator, we use grounding span prediction as an auxiliary task to be jointly trained with the main task of response generation. We also adopt a passage dropout and regularization technique to improve response generation performance. Experimental results indicate that the system clearly surpasses the competitive baseline and our team CPII-NLP ranked 1st among the public submissions on ALL four leaderboards based on the sum of F1, SacreBLEU, METEOR and RougeL scores.
pdf
bib
abs
COLD: A Benchmark for Chinese Offensive Language Detection
Jiawen Deng
|
Jingyan Zhou
|
Hao Sun
|
Chujie Zheng
|
Fei Mi
|
Helen Meng
|
Minlie Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Offensive language detection is increasingly crucial for maintaining a civilized social media platform and deploying pre-trained language models. However, this task in Chinese is still under exploration due to the scarcity of reliable datasets. To this end, we propose a benchmark –COLD for Chinese offensive language analysis, including a Chinese Offensive Language Dataset –COLDATASET and a baseline detector –COLDETECTOR which is trained on the dataset. We show that the COLD benchmark contributes to Chinese offensive language detection which is challenging for existing resources. We then deploy the COLDETECTOR and conduct detailed analyses on popular Chinese pre-trained language models. We first analyze the offensiveness of existing generative models and show that these models inevitably expose varying degrees of offensive issues. Furthermore, we investigate the factors that influence the offensive generations, and we find that anti-bias contents and keywords referring to certain groups or revealing negative attitudes trigger offensive outputs easier.
pdf
bib
abs
Toward Self-Learning End-to-End Task-oriented Dialog Systems
Xiaoying Zhang
|
Baolin Peng
|
Jianfeng Gao
|
Helen Meng
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
End-to-end task bots are typically learned over a static and usually limited-size corpus. However, when deployed in dynamic, changing, and open environments to interact with users, task bots tend to fail when confronted with data that deviate from the training corpus, i.e., out-of-distribution samples. In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations. We propose SL-Agent, a novel self-learning framework for building end-to-end task bots. SL-Agent consists of a dialog model and a pre-trained reward model to predict the quality of an agent response. It enables task bots to automatically adapt to changing environments by learning from the unlabeled human-bot dialog logs accumulated after deployment via reinforcement learning with the incorporated reward model. Experimental results on four well-studied dialog tasks show the effectiveness of SL-Agent to automatically adapt to changing environments, using both automatic and human evaluations. We will release code and data for further research.
pdf
bib
abs
On Controlling Fallback Responses for Grounded Dialogue Generation
Hongyuan Lu
|
Wai Lam
|
Hong Cheng
|
Helen Meng
Findings of the Association for Computational Linguistics: ACL 2022
Dialogue agents can leverage external textual knowledge to generate responses of a higher quality. To our best knowledge, most existing works on knowledge grounded dialogue settings assume that the user intention is always answerable. Unfortunately, this is impractical as there is no guarantee that the knowledge retrievers could always retrieve the desired knowledge. Therefore, this is crucial to incorporate fallback responses to respond to unanswerable contexts appropriately while responding to the answerable contexts in an informative manner. We propose a novel framework that automatically generates a control token with the generator to bias the succeeding response towards informativeness for answerable contexts and fallback for unanswerable contexts in an end-to-end manner. Since no existing knowledge grounded dialogue dataset considers this aim, we augment the existing dataset with unanswerable contexts to conduct our experiments. Automatic and human evaluation results indicate that naively incorporating fallback responses with controlled text generation still hurts informativeness for answerable context. In contrast, our proposed framework effectively mitigates this problem while still appropriately presenting fallback responses to unanswerable contexts. Such a framework also reduces the extra burden of the additional classifier and the overheads introduced in the previous works, which operates in a pipeline manner.
pdf
bib
abs
Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark
Jingyan Zhou
|
Jiawen Deng
|
Fei Mi
|
Yitong Li
|
Yasheng Wang
|
Minlie Huang
|
Xin Jiang
|
Qun Liu
|
Helen Meng
Findings of the Association for Computational Linguistics: EMNLP 2022
Among all the safety concerns that hinder the deployment of open-domain dialog systems (e.g., offensive languages, biases, and toxic behaviors), social bias presents an insidious challenge. Addressing this challenge requires rigorous analyses and normative reasoning. In this paper, we focus our investigation on social bias measurement to facilitate the development of unbiased dialog systems. We first propose a novel Dial-Bias Framework for analyzing the social bias in conversations using a holistic method beyond bias lexicons or dichotomous annotations. Leveraging the proposed framework, we further introduce the CDial-Bias Dataset which is, to the best of our knowledge, the first annotated Chinese social bias dialog dataset. We also establish a fine-grained dialog bias measurement benchmark and conduct in-depth ablation studies to shed light on the utility of the detailed annotations in the proposed dataset. Finally, we evaluate representative Chinese generative models with our classifiers to unveil the presence of social bias in these systems.
pdf
bib
abs
Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis
Xueyuan Chen
|
Shun Lei
|
Zhiyong Wu
|
Dong Xu
|
Weifeng Zhao
|
Helen Meng
Proceedings of the 29th International Conference on Computational Linguistics
Naturalness and expressiveness are crucial for audiobook speech synthesis, but now are limited by the averaged global-scale speaking style representation. In this paper, we propose an unsupervised multi-scale context-sensitive text-to-speech model for audiobooks. A multi-scale hierarchical context encoder is specially designed to predict both global-scale context style embedding and local-scale context style embedding from a wider context of input text in a hierarchical manner. Likewise, a multi-scale reference encoder is introduced to extract reference style embeddings at both global and local scales from the reference speech, which is used to guide the prediction of speaking styles. On top of these, a bi-reference attention mechanism is used to align both local-scale reference style embedding sequence and local-scale context style embedding sequence with corresponding phoneme embedding sequence. Both objective and subjective experiment results on a real-world multi-speaker Mandarin novel audio dataset demonstrate the excellent performance of our proposed method over all baselines in terms of naturalness and expressiveness of the synthesized speech.
pdf
bib
abs
Partner Personas Generation for Dialogue Response Generation
Hongyuan Lu
|
Wai Lam
|
Hong Cheng
|
Helen Meng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Incorporating personas information allows diverse and engaging responses in dialogue response generation. Unfortunately, prior works have primarily focused on self personas and have overlooked the value of partner personas. Moreover, in practical applications, the availability of the gold partner personas is often not the case. This paper attempts to tackle these issues by offering a novel framework that leverages automatic partner personas generation to enhance the succeeding dialogue response generation. Our framework employs reinforcement learning with a dedicatedly designed critic network for reward judgement. Experimental results from automatic and human evaluations indicate that our framework is capable of generating relevant, interesting, coherent and informative partner personas, even compared to the ground truth partner personas. This enhances the succeeding dialogue response generation, which surpasses our competitive baselines that condition on the ground truth partner personas.
2015
pdf
bib
Analysis of Dysarthric Speech using Distinctive Feature Recognition
Ka Ho Wong
|
Yu Ting Yeung
|
Patrick C. M. Wong
|
Gina-Anne Levow
|
Helen Meng
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies
pdf
bib
Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings
Pengfei Liu
|
Shafiq Joty
|
Helen Meng
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
2014
pdf
bib
SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis
Pengfei Liu
|
Helen Meng
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
2009
pdf
bib
Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features
Wai-Kit Lo
|
Wenying Xiong
|
Helen Meng
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
pdf
bib
Developing Speech Recognition and Synthesis Technologies to Support Computer-Aided Pronunciation Training for Chinese Learners of English
Helen Meng
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1
2007
pdf
bib
Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News
Lei Xie
|
Chuan Liu
|
Helen Meng
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
2006
pdf
bib
A Maximum Entropy Framework that Integrates Word Dependencies and Grammatical Relations for Reading Comprehension
Kui Xu
|
Helen Meng
|
Fuliang Weng
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
2005
pdf
bib
Design and Development of a Bilingual Reading Comprehension Corpus
Kui Xu
|
Helen Meng
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 2, June 2005: Special Issue on Annotated Speech Corpora
pdf
bib
The Use of Metadata, Web-derived Answer Patterns and Passage Context to Improve Reading Comprehension Performance
Yongping Du
|
Helen Meng
|
Xuanjing Huang
|
Lide Wu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
2001
pdf
bib
Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks
W.K. Lo
|
P.C. Ching
|
Tan Lee
|
Helen Meng
Proceedings of Research on Computational Linguistics Conference XIV
pdf
bib
Automatic Grammar Partitioning for Syntactic Parsing
Po Chui Luk
|
Fuliang Weng
|
Helen Meng
Proceedings of the Seventh International Workshop on Parsing Technologies
pdf
bib
Mandarin-English Information: Investigating Translingual Speech Retrieval
Helen Meng
|
Berlin Chen
|
Sanjeev Khudanpur
|
Gina-Anne Levow
|
Wai-Kit Lo
|
Douglas Oard
|
Patrick Shone
|
Karen Tang
|
Hsin-Min Wang
|
Jianqiang Wang
Proceedings of the First International Conference on Human Language Technology Research
pdf
bib
Scalability and Portability of a Belief Network-based Dialog Model for Different Application Domains
Carmen Wai
|
Helen M. Meng
|
Roberto Pieraccini
Proceedings of the First International Conference on Human Language Technology Research
2000
pdf
bib
Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval
Helen Meng
|
Sanjeev Khudanpur
|
Gina Levow
|
Douglas W. Oard
|
Hsin-Min Wang
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems
pdf
bib
abs
Parsing a Lattice with Multiple Grammars
Fuliang Weng
|
Helen Meng
|
Po Chui Luk
Proceedings of the Sixth International Workshop on Parsing Technologies
Efficiency, memory, ambiguity, robustness and scalability are the central issues in natural language parsing. Because of the complexity of natural language, different parsers may be suited only to certain subgrammars. In addition, grammar maintenance and updating may have adverse effects on tuned parsers. Motivated by these concerns, [25] proposed a grammar partitioning and top-down parser composition mechanism for loosely restricted Context-Free Grammars (CFGs). In this paper, we report on significant progress, i.e., (1) developing guidelines for the grammar partition through a set of heuristics, (2) devising a new mix-strategy composition algorithms for any rule-based grammar partition in a lattice framework, and 3) initial but encouraging parsing results for Chinese and English queries from an Air Travel Information System (ATIS) corpus.
1999
pdf
bib
An Analytical Study of Transformational Tagging for Chinese Text
Helen Meng
|
Chun Wah Ip
ROCLING 1999 Short Papers
1994
pdf
bib
Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation
Helen M. Meng
|
Stephanie Seneff
|
Victor W. Zue
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994
1991
pdf
bib
Signal Representation Attribute Extraction and the Use Distinctive Features for Phonetic Classification
Helen M. Meng
|
Victor W. Zue
|
Hong C. Leung
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991