Tianshu Wang


2024

pdf bib
Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models
Boxi Cao | Qiaoyu Tang | Hongyu Lin | Shanshan Jiang | Bin Dong | Xianpei Han | Jiawei Chen | Tianshu Wang | Le Sun
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memorizing dynamic mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models without pre-training are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.

2022

pdf bib
Improving Faithfulness by Augmenting Negative Summaries from Fake Documents
Tianshu Wang | Faisal Ladhak | Esin Durmus | He He
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Current abstractive summarization systems tend to hallucinate content that is unfaithful to the source document, posing a risk of misinformation. To mitigate hallucination, we must teach the model to distinguish hallucinated summaries from faithful ones. However, the commonly used maximum likelihood training does not disentangle factual errors from other model errors. To address this issue,we propose a back-translation-style approach to augment negative samples that mimic factual errors made by the model. Specifically, we train an elaboration model that generates hallucinated documents given the reference summaries, and then generates negative summaries from the fake documents. We incorporate the negative samples into training through a controlled generator, which produces faithful/unfaithful summaries conditioned on the control codes. Additionally, we find that adding textual entailment data through multitasking further boosts the performance. Experiments on three datasets (XSum, Gigaword, and WikiHow) show that our method consistently improves faithfulness without sacrificing informativeness according to both human and automatic evaluation