Bootstrapped Policy Learning for Task-oriented Dialogue through Goal Shaping

Yangyang Zhao, Ben Niu, Mehdi Dastani, Shihan Wang


Abstract
Reinforcement learning shows promise in optimizing dialogue policies, but addressing the challenge of reward sparsity remains crucial. While curriculum learning offers a practical solution by strategically training policies from simple to complex, it hinges on the assumption of a gradual increase in goal difficulty to ensure a smooth knowledge transition across varied complexities. In complex dialogue environments without intermediate goals, achieving seamless knowledge transitions becomes tricky. This paper proposes a novel Bootstrapped Policy Learning (BPL) framework, which adaptively tailors progressively challenging subgoal curriculum for each complex goal through goal shaping, ensuring a smooth knowledge transition. Goal shaping involves goal decomposition and evolution, decomposing complex goals into subgoals with solvable maximum difficulty and progressively increasing difficulty as the policy improves. Moreover, to enhance BPL’s adaptability across various environments, we explore various combinations of goal decomposition and evolution within BPL, and identify two universal curriculum patterns that remain effective across different dialogue environments, independent of specific environmental constraints. By integrating the summarized curriculum patterns, our BPL has exhibited efficacy and versatility across four publicly available datasets with different difficulty levels.
Anthology ID:
2024.emnlp-main.263
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4566–4580
Language:
URL:
https://aclanthology.org/2024.emnlp-main.263/
DOI:
10.18653/v1/2024.emnlp-main.263
Bibkey:
Cite (ACL):
Yangyang Zhao, Ben Niu, Mehdi Dastani, and Shihan Wang. 2024. Bootstrapped Policy Learning for Task-oriented Dialogue through Goal Shaping. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 4566–4580, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Bootstrapped Policy Learning for Task-oriented Dialogue through Goal Shaping (Zhao et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.263.pdf