SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

Atharva Kulkarni, Bo-Hsiang Tseng, Joel Ruben Antony Moniz, Dhivya Piraviperumal, Hong Yu, Shruti Bhargava


Abstract
In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning requires no training data, it significantly lags behind the few-shot setup. Thus, ‘Can we efficiently generate synthetic data for any dialogue schema to enable few-shot prompting?' Addressing this question, we propose , a data generation framework tailored for DST, utilizing LLMs. Our approach only requires the dialogue schema and a few hand-crafted dialogue templates to synthesize natural, coherent, and free-flowing dialogues with DST annotations. Few-shot learning using data from results in 4-5% improvement in Joint Goal Accuracy over the zero-shot baseline on MultiWOZ 2.1 and 2.4. Remarkably, our few-shot learning approach recovers nearly 98% of the performance compared to the few-shot setup using human-annotated training data.
Anthology ID:
2024.eacl-long.120
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1988–2001
Language:
URL:
https://aclanthology.org/2024.eacl-long.120
DOI:
Bibkey:
Cite (ACL):
Atharva Kulkarni, Bo-Hsiang Tseng, Joel Ruben Antony Moniz, Dhivya Piraviperumal, Hong Yu, and Shruti Bhargava. 2024. SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1988–2001, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking (Kulkarni et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-long.120.pdf
Note:
 2024.eacl-long.120.note.zip