Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding

Shuwen Deng, Paul Prasse, David Reich, Tobias Scheffer, Lena Jäger


Abstract
Human gaze data offer cognitive information that reflects natural language comprehension. Indeed, augmenting language models with human scanpaths has proven beneficial for a range of NLP tasks, including language understanding. However, the applicability of this approach is hampered because the abundance of text corpora is contrasted by a scarcity of gaze data. Although models for the generation of human-like scanpaths during reading have been developed, the potential of synthetic gaze data across NLP tasks remains largely unexplored. We develop a model that integrates synthetic scanpath generation with a scanpath-augmented language model, eliminating the need for human gaze data. Since the model’s error gradient can be propagated throughout all parts of the model, the scanpath generator can be fine-tuned to downstream tasks. We find that the proposed model not only outperforms the underlying language model, but achieves a performance that is comparable to a language model augmented with real human gaze data. Our code is publicly available.
Anthology ID:
2023.emnlp-main.400
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6500–6507
Language:
URL:
https://aclanthology.org/2023.emnlp-main.400
DOI:
10.18653/v1/2023.emnlp-main.400
Bibkey:
Cite (ACL):
Shuwen Deng, Paul Prasse, David Reich, Tobias Scheffer, and Lena Jäger. 2023. Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6500–6507, Singapore. Association for Computational Linguistics.
Cite (Informal):
Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding (Deng et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.400.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.400.mp4