2024
pdf
bib
abs
Hybrid Alignment Training for Large Language Models
Chenglong Wang
|
Hang Zhou
|
Kaiyan Chang
|
Bei Li
|
Yongyu Mu
|
Tong Xiao
|
Tongran Liu
|
JingBo Zhu
Findings of the Association for Computational Linguistics ACL 2024
Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guarantee to simultaneously align with the instructions and human preferences well. To response to these, in this work, we propose a Hybrid Alignment Training (Hbat) approach, based on alternating alignment and modified elastic weight consolidation methods. The basic idea is to alternate between different objectives during alignment training, so that better collaboration can be achieved between the two alignment tasks. We experiment with Hbat on summarization and dialogue tasks. Experimental results show that the proposed Hbat can significantly outperform all baselines. Notably, Hbat yields consistent performance gains over the traditional two-stage alignment training when using both proximal policy optimization and direct preference optimization.
2023
pdf
bib
abs
Augmenting Large Language Model Translators via Translation Memories
Yongyu Mu
|
Abudurexiti Reheman
|
Zhiquan Cao
|
Yuchun Fan
|
Bei Li
|
Yinqiao Li
|
Tong Xiao
|
Chunliang Zhang
|
Jingbo Zhu
Findings of the Association for Computational Linguistics: ACL 2023
Using translation memories (TMs) as prompts is a promising approach to in-context learning of machine translation models. In this work, we take a step towards prompting large language models (LLMs) with TMs and making them better translators. We find that the ability of LLMs to “understand” prompts is indeed helpful for making better use of TMs. Experiments show that the results of a pre-trained LLM translator can be greatly improved by using high-quality TM-based prompts. These results are even comparable to those of the state-of-the-art NMT systems which have access to large-scale in-domain bilingual data and are well tuned on the downstream tasks.
2022
pdf
bib
abs
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
|
Yi Lu
|
Yongyu Mu
|
Yimin Hu
|
Tong Xiao
|
Jingbo Zhu
Findings of the Association for Computational Linguistics: EMNLP 2022
Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model.In this process, we typically have multiple types of knowledge extracted from the teacher model.The problem is to make full use of them to train the student model.Our preliminary study shows that: (1) not all of the knowledge is necessary for learning a good student model, and (2) knowledge distillation can benefit from certain knowledge at different training steps.In response to these, we propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation.In addition, we offer a refinement of the training algorithm to ease the computational burden.Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.
2021
pdf
bib
abs
The NiuTrans Machine Translation Systems for WMT21
Shuhan Zhou
|
Tao Zhou
|
Binghao Wei
|
Yingfeng Luo
|
Yongyu Mu
|
Zefan Zhou
|
Chenglong Wang
|
Xuanjun Zhou
|
Chuanhao Lv
|
Yi Jing
|
Laohu Wang
|
Jingnan Zhang
|
Canan Huang
|
Zhongxiang Yan
|
Chi Hu
|
Bei Li
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Sixth Conference on Machine Translation
This paper describes NiuTrans neural machine translation systems of the WMT 2021 news translation tasks. We made submissions to 9 language directions, including English2Chinese, Japanese, Russian, Icelandic and English2Hausa tasks. Our primary systems are built on several effective variants of Transformer, e.g., Transformer-DLCL, ODE-Transformer. We also utilize back-translation, knowledge distillation, post-ensemble, and iterative fine-tuning techniques to enhance the model performance further.
pdf
bib
abs
The NiuTrans System for the WMT 2021 Efficiency Task
Chenglong Wang
|
Chi Hu
|
Yongyu Mu
|
Zhongxiang Yan
|
Siming Wu
|
Yimin Hu
|
Hang Cao
|
Bei Li
|
Ye Lin
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Sixth Conference on Machine Translation
This paper describes the NiuTrans system for the WMT21 translation efficiency task. Following last year’s work, we explore various techniques to improve the efficiency while maintaining translation quality. We investigate the combinations of lightweight Transformer architectures and knowledge distillation strategies. Also, we improve the translation efficiency with graph optimization, low precision, dynamic batching, and parallel pre/post-processing. Putting these together, our system can translate 247,000 words per second on an NVIDIA A100, being 3× faster than our last year’s system. Our system is the fastest and has the lowest memory consumption on the GPU-throughput track. The code, model, and pipeline will be available at NiuTrans.NMT.
2020
pdf
bib
abs
The NiuTrans Machine Translation Systems for WMT20
Yuhao Zhang
|
Ziyang Wang
|
Runzhe Cao
|
Binghao Wei
|
Weiqiao Shan
|
Shuhan Zhou
|
Abudurexiti Reheman
|
Tao Zhou
|
Xin Zeng
|
Laohu Wang
|
Yongyu Mu
|
Jingnan Zhang
|
Xiaoqian Liu
|
Xuanjun Zhou
|
Yinqiao Li
|
Bei Li
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Fifth Conference on Machine Translation
This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks. We participated in Japanese<->English, English->Chinese, Inuktitut->English and Tamil->English total five tasks and rank first in Japanese<->English both sides. We mainly utilized iterative back-translation, different depth and widen model architectures, iterative knowledge distillation and iterative fine-tuning. And we find that adequately widened and deepened the model simultaneously, the performance will significantly improve. Also, iterative fine-tuning strategy we implemented is effective during adapting domain. For Inuktitut->English and Tamil->English tasks, we built multilingual models separately and employed pretraining word embedding to obtain better performance.