LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Dongheng Li, Yongchang Hao, Lili Mou


Abstract
Large language models have become increasingly popular and demonstrated remarkable performance in various natural language processing (NLP) tasks. However, these models are typically computationally expensive and difficult to be deployed in resource-constrained environments. In this paper, we propose LLMR, a novel knowledge distillation (KD) method based on a reward function induced from large language models. We conducted experiments on multiple datasets in the dialogue generation and summarization tasks. Empirical results demonstrate that our LLMR approach consistently outperforms traditional KD methods in different tasks and datasets.
Anthology ID:
2024.lrec-main.932
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
10657–10664
Language:
URL:
https://aclanthology.org/2024.lrec-main.932
DOI:
Bibkey:
Cite (ACL):
Dongheng Li, Yongchang Hao, and Lili Mou. 2024. LLMR: Knowledge Distillation with a Large Language Model-Induced Reward. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10657–10664, Torino, Italia. ELRA and ICCL.
Cite (Informal):
LLMR: Knowledge Distillation with a Large Language Model-Induced Reward (Li et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.932.pdf