2021
pdf
bib
abs
Clipping Loops for Sample-Efficient Dialogue Policy Optimisation
Yen-Chen Wu
|
Carl Edward Rasmussen
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Training dialogue agents requires a large number of interactions with users: agents have no idea about which responses are bad among a lengthy dialogue. In this paper, we propose loop-clipping policy optimisation (LCPO) to eliminate useless responses. LCPO consists of two stages: loop clipping and advantage clipping. In loop clipping, we clip off useless responses (called loops) from dialogue history (called trajectories). The clipped trajectories are more succinct than the original ones, and the estimation of state-value is more accurate. Second, in advantage clipping, we estimate and clip the advantages of useless responses and normal ones separately. The clipped advantage distinguish useless actions from others and reduce the probabilities of useless actions efficiently. In experiments on Cambridge Restaurant Dialogue System, LCPO uses only 260 training dialogues to achieve 80% success rate, while PPO baseline requires 2160 dialogues. Besides, LCPO receives 3.7/5 scores in human evaluation where the agent interactively collects 100 real-user dialogues in training phase.
2020
pdf
bib
abs
Actor-Double-Critic: Incorporating Model-Based Critic for Task-Oriented Dialogue Systems
Yen-chen Wu
|
Bo-Hsiang Tseng
|
Milica Gasic
Findings of the Association for Computational Linguistics: EMNLP 2020
In order to improve the sample-efficiency of deep reinforcement learning (DRL), we implemented imagination augmented agent (I2A) in spoken dialogue systems (SDS). Although I2A achieves a higher success rate than baselines by augmenting predicted future into a policy network, its complicated architecture introduces unwanted instability. In this work, we propose actor-double-critic (ADC) to improve the stability and overall performance of I2A. ADC simplifies the architecture of I2A to reduce excessive parameters and hyper-parameters. More importantly, a separate model-based critic shares parameters between actions and makes back-propagation explicit. In our experiments on Cambridge Restaurant Booking task, ADC enhances success rates considerably and shows robustness to imperfect environment models. In addition, ADC exhibits the stability and sample-efficiency as significantly reducing the baseline standard deviation of success rates and reaching the 80% success rate with half training data.
2019
pdf
bib
abs
Tree-Structured Semantic Encoder with Knowledge Sharing for Domain Adaptation in Natural Language Generation
Bo-Hsiang Tseng
|
Paweł Budzianowski
|
Yen-chen Wu
|
Milica Gasic
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Domain adaptation in natural language generation (NLG) remains challenging because of the high complexity of input semantics across domains and limited data of a target domain. This is particularly the case for dialogue systems, where we want to be able to seamlessly include new domains into the conversation. Therefore, it is crucial for generation models to share knowledge across domains for the effective adaptation from one domain to another. In this study, we exploit a tree-structured semantic encoder to capture the internal structure of complex semantic representations required for multi-domain dialogues in order to facilitate knowledge sharing across domains. In addition, a layer-wise attention mechanism between the tree encoder and the decoder is adopted to further improve the model’s capability. The automatic evaluation results show that our model outperforms previous methods in terms of the BLEU score and the slot error rate, in particular when the adaptation data is limited. In subjective evaluation, human judges tend to prefer the sentences generated by our model, rating them more highly on informativeness and naturalness than other systems.
2018
pdf
bib
abs
Addressing Objects and Their Relations: The Conversational Entity Dialogue Model
Stefan Ultes
|
Paweł Budzianowski
|
Iñigo Casanueva
|
Lina M. Rojas-Barahona
|
Bo-Hsiang Tseng
|
Yen-Chen Wu
|
Steve Young
|
Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Statistical spoken dialogue systems usually rely on a single- or multi-domain dialogue model that is restricted in its capabilities of modelling complex dialogue structures, e.g., relations. In this work, we propose a novel dialogue model that is centred around entities and is able to model relations as well as multiple entities of the same type. We demonstrate in a prototype implementation benefits of relation modelling on the dialogue level and show that a trained policy using these relations outperforms the multi-domain baseline. Furthermore, we show that by modelling the relations on the dialogue level, the system is capable of processing relations present in the user input and even learns to address them in the system response.
pdf
bib
abs
Feudal Dialogue Management with Jointly Learned Feature Extractors
Iñigo Casanueva
|
Paweł Budzianowski
|
Stefan Ultes
|
Florian Kreyssig
|
Bo-Hsiang Tseng
|
Yen-chen Wu
|
Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Reinforcement learning (RL) is a promising dialogue policy optimisation approach, but traditional RL algorithms fail to scale to large domains. Recently, Feudal Dialogue Management (FDM), has shown to increase the scalability to large domains by decomposing the dialogue management decision into two steps, making use of the domain ontology to abstract the dialogue state in each step. In order to abstract the state space, however, previous work on FDM relies on handcrafted feature functions. In this work, we show that these feature functions can be learned jointly with the policy model while obtaining similar performance, even outperforming the handcrafted features in several environments and domains.
pdf
bib
abs
Variational Cross-domain Natural Language Generation for Spoken Dialogue Systems
Bo-Hsiang Tseng
|
Florian Kreyssig
|
Paweł Budzianowski
|
Iñigo Casanueva
|
Yen-Chen Wu
|
Stefan Ultes
|
Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Cross-domain natural language generation (NLG) is still a difficult task within spoken dialogue modelling. Given a semantic representation provided by the dialogue manager, the language generator should generate sentences that convey desired information. Traditional template-based generators can produce sentences with all necessary information, but these sentences are not sufficiently diverse. With RNN-based models, the diversity of the generated sentences can be high, however, in the process some information is lost. In this work, we improve an RNN-based generator by considering latent information at the sentence level during generation using conditional variational auto-encoder architecture. We demonstrate that our model outperforms the original RNN-based generator, while yielding highly diverse sentences. In addition, our model performs better when the training data is limited.