Weak-to-Strong Reasoning

Yuqing Yang, Yan Ma, Pengfei Liu


Abstract
When large language models (LLMs) surpass human capabilities, supervising them effectively becomes difficult. Weak-to-strong learning, where a less capable model enhances a stronger one, proves valuable in this context. Yet, the efficacy of this paradigm for complex reasoning tasks is still unexplored. In this paper, we introduce a progressive weak-to-strong reasoning framework that enables the strong model to autonomously refine its training data, maximizing the use of weak signals and unlocking its latent abilities. This framework begins with supervised fine-tuning on a selective small but high-quality dataset, followed by preference optimization on contrastive samples identified by the strong model itself. Experiments on the GSM8K and MATH datasets verify that our method can effectively improve the reasoning capabilities of Llama2-70b using three separate weak models. This work paves the way for a more scalable and sophisticated strategy to enhance AI reasoning powers. All relevant code and resources are available in https://github.com/GAIR-NLP/weak-to-strong-reasoning.
Anthology ID:
2024.findings-emnlp.490
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8350–8367
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.490/
DOI:
10.18653/v1/2024.findings-emnlp.490
Bibkey:
Cite (ACL):
Yuqing Yang, Yan Ma, and Pengfei Liu. 2024. Weak-to-Strong Reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8350–8367, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Weak-to-Strong Reasoning (Yang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.490.pdf