Haeun Yu
2024
Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu
|
Pepa Atanasova
|
Isabelle Augenstein
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model’s inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components. Instance Attribution (IA) and Neuron Attribution (NA) offer insights into this training-acquired knowledge, though they have not been compared systematically. Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to discover important neurons of influential instances discovered by IA. We further propose a comprehensive list of faithfulness tests to evaluate the comprehensiveness and sufficiency of the explanations provided by both methods. Through extensive experiments and analysis, we demonstrate that NA generally reveals more diverse and comprehensive information regarding the LM’s parametric knowledge compared to IA. Nevertheless, IA provides unique and valuable insights into the LM’s parametric knowledge, which are not revealed by NA. Our findings further suggest the potential of a synergistic approach of combining the diverse findings of IA and NA for a more holistic understanding of an LM’s parametric knowledge.
2023
CopyT5: Copy Mechanism and Post-Trained T5 for Speech-Aware Dialogue State Tracking System
Cheonyoung Park
|
Eunji Ha
|
Yewon Jeong
|
Chi-young Kim
|
Haeun Yu
|
Joo-won Sung
Proceedings of The Eleventh Dialog System Technology Challenge
In a real-world environment, Dialogue State Tracking (DST) should use speech recognition results to perform tasks. However, most existing DST research has been conducted in text-based environments. This study aims to build a model that efficiently performs Automatic Speech Recognition-based DST. To operate robustly against speech noise, we used CopyT5, which adopted a copy mechanism, and trained the model using augmented data including speech noise. Furthermore, CopyT5 performed post-training using the masked language modeling method with the MultiWOZ dataset in T5 in order to learn the dialogue context better. The copy mechanism also mitigated name entity errors that may occur during DST generation. Experiments confirmed that data augmentation, post-training, and the copy mechanism effectively improve DST performance.
Search
Co-authors
- Cheonyoung Park 1
- Eunji Ha 1
- Yewon Jeong 1
- Chi-young Kim 1
- Joo Won Sung 1
- show all...