2024
pdf
bib
abs
Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models
Yi Luo
|
Zhenghao Lin
|
YuHao Zhang
|
Jiashuo Sun
|
Chen Lin
|
Chengjin Xu
|
Xiangdong Su
|
Yelong Shen
|
Jian Guo
|
Yeyun Gong
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-Align, a two-stage approach. Initially, a safety-trained model identifies potential risks and formulates specific guidelines for various inputs, establishing a comprehensive library of guidelines and a model for input-guidelines retrieval. Subsequently, the retrieval model correlates new inputs with relevant guidelines, which guide LLMs in response generation to ensure safe and high-quality outputs, thereby aligning with human values. An additional optional stage involves fine-tuning a model with well-aligned datasets generated through the process implemented in the second stage.Our method customizes guidelines to accommodate diverse inputs, thereby enhancing the fine-grainedness and comprehensiveness of the guideline library. Furthermore, it incorporates safety expertise from a safety-trained LLM through a lightweight retrieval model.We evaluate our approach on three benchmarks, demonstrating significant improvements in LLM security and quality. Notably, our fine-tuned model, Labrador, even at 13 billion parameters, outperforms GPT-3.5-turbo and surpasses GPT-4 in alignment capabilities.
pdf
bib
abs
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
Jiashuo Sun
|
Yi Luo
|
Yeyun Gong
|
Chen Lin
|
Yelong Shen
|
Jian Guo
|
Nan Duan
Findings of the Association for Computational Linguistics: NAACL 2024
Large language models (LLMs) can achieve impressive performance on various reasoning tasks by incorporating chain-of-thought (CoT) prompting, where step-by-step reasoning is provided to guide LLMs to generate answers to questions, and the question-rationale-answer triplets are utilized as demonstration exemplars. However, the reasoning chains of demonstrations generated by LLMs are observed to be prone to errors, which can subsequently lead to incorrect reasoning during inference. Furthermore, inappropriate exemplars, e.g., overly simplistic or complex exemplars depending on the question’s difficulty level, can affect the LLM’s performance. To address these issues, we introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts prompting). Iter-CoT has two advantages: (1) it adopts iterative bootstrapping that enables LLMs to rectify errors autonomously, resulting in more precise and comprehensive reasoning chains. (2) it selects exemplars of challenging yet answerable (i.e., the LLM has the potential to answer correctly) questions, enhancing the LLMs’ generalizability to answer questions with varying difficulty levels. Experimental results exhibit Iter-CoT superior performance on three distinct reasoning tasks on ten datasets.
pdf
bib
abs
APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning
Jiashuo Sun
|
Hang Zhang
|
Chen Lin
|
Xiangdong Su
|
Yeyun Gong
|
Jian Guo
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Long-form numerical reasoning aims to generate a reasoning program to calculate the answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on the retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numerical information. Furthermore, they ignored program consistency, leading to the wrong punishment of programs that differed from the ground truth. In order to address these issues, we proposed APOLLO (An optimized training aPproach fOr Long-form numericaL reasOning), to improve long-form numerical reasoning. APOLLO includes a number-aware negative sampling strategy for the retriever to discriminate key numerical facts, and a consistency-based reinforcement learning with target program augmentation for the generator to ultimately increase the execution accuracy. Experimental results on the FinQA and ConvFinQA leaderboards verify the effectiveness of our proposed methods, achieving the new state-of-the-art.
pdf
bib
abs
Knowledge Enhanced Pre-training for Cross-lingual Dense Retrieval
Hang Zhang
|
Yeyun Gong
|
Dayiheng Liu
|
Shunyu Zhang
|
Xingwei He
|
Jiancheng Lv
|
Jian Guo
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In recent years, multilingual pre-trained language models (mPLMs) have achieved significant progress in cross-lingual dense retrieval. However, most mPLMs neglect the importance of knowledge. Knowledge always conveys similar semantic concepts in a language-agnostic manner, while query-passage pairs in cross-lingual retrieval also share common factual information. Motivated by this observation, we introduce KEPT, a novel mPLM that effectively leverages knowledge to learn language-agnostic semantic representations. To achieve this, we construct a multilingual knowledge base using hyperlinks and cross-language page alignment data annotated by Wiki. From this knowledge base, we mine intra- and cross-language pairs by extracting symmetrically linked segments and multilingual entity descriptions. Subsequently, we adopt contrastive learning with the mined pairs to pre-train KEPT. We evaluate KEPT on three widely-used benchmarks, considering both zero-shot cross-lingual transfer and supervised multilingual fine-tuning scenarios. Extensive experimental results demonstrate that KEPT achieves strong multilingual and cross-lingual retrieval performance with significant improvements over existing mPLMs.
pdf
bib
abs
Unlocking the Power of Large Language Models for Entity Alignment
Xuhui Jiang
|
Yinghan Shen
|
Zhichao Shi
|
Chengjin Xu
|
Wei Li
|
Zixuan Li
|
Jian Guo
|
Huawei Shen
|
Yuanzhuo Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Entity Alignment (EA) is vital for integrating diverse knowledge graph (KG) data, playing a crucial role in data-driven AI applications. Traditional EA methods primarily rely on comparing entity embeddings, but their effectiveness is constrained by the limited input KG data and the capabilities of the representation learning techniques. Against this backdrop, we introduce ChatEA, an innovative framework that incorporates large language models (LLMs) to improve EA. To address the constraints of limited input KG data, ChatEA introduces a KG-code translation module that translates KG structures into a format understandable by LLMs, thereby allowing LLMs to utilize their extensive background knowledge to improve EA accuracy. To overcome the over-reliance on entity embedding comparisons, ChatEA implements a two-stage EA strategy that capitalizes on LLMs’ capability for multi-step reasoning in a dialogue format, thereby enhancing accuracy while preserving efficiency. Our experimental results affirm ChatEA’s superior performance, highlighting LLMs’ potential in facilitating EA tasks.The source code is available at https://anonymous.4open.science/r/ChatEA/.
2023
pdf
bib
abs
Noisy Pair Corrector for Dense Retrieval
Hang Zhang
|
Yeyun Gong
|
Xingwei He
|
Dayiheng Liu
|
Daya Guo
|
Jiancheng Lv
|
Jian Guo
Findings of the Association for Computational Linguistics: EMNLP 2023
Most dense retrieval models contain an implicit assumption: the training query-document pairs are exactly matched. Since it is expensive to annotate the corpus manually, training pairs in real-world applications are usually collected automatically, which inevitably introduces mismatched-pair noise. In this paper, we explore an interesting and challenging problem in dense retrieval, how to train an effective model with mismatched-pair noise. To solve this problem, we propose a novel approach called Noisy Pair Corrector (NPC), which consists of a detection module and a correction module. The detection module estimates noise pairs by calculating the perplexity between annotated positive and easy negative documents. The correction module utilizes an exponential moving average (EMA) model to provide a soft supervised signal, aiding in mitigating the effects of noise. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS. Experimental results show that NPC achieves excellent performance in handling both synthetic and realistic noise.
2022
pdf
bib
abs
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis
Shuai Fan
|
Chen Lin
|
Haonan Li
|
Zhenghao Lin
|
Jinsong Su
|
Hang Zhang
|
Yeyun Gong
|
JIan Guo
|
Nan Duan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Most existing pre-trained language representation models (PLMs) are sub-optimal in sentiment analysis tasks, as they capture the sentiment information from word-level while under-considering sentence-level information. In this paper, we propose SentiWSP, a novel Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.The word level pre-training task detects replaced sentiment words, via a generator-discriminator framework, to enhance the PLM’s knowledge about sentiment words.The sentence level pre-training task further strengthens the discriminator via a contrastive learning framework, with similar sentences as negative samples, to encode sentiments in a sentence.Extensive experimental results show that SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks. We have made our code and model publicly available at https://github.com/XMUDM/SentiWSP.