2024
pdf
bib
abs
Temperature-scaling surprisal estimates improve fit to human reading times – but does it do so for the “right reasons”?
Tong Liu
|
Iza Škrjanec
|
Vera Demberg
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
A wide body of evidence shows that human language processing difficulty is predicted by the information-theoretic measure surprisal, a word’s negative log probability in context. However, it is still unclear how to best estimate these probabilities needed for predicting human processing difficulty – while a long-standing belief held that models with lower perplexity would provide more accurate estimates of word predictability, and therefore lead to better reading time predictions, recent work has shown that for very large models, psycholinguistic predictive power decreases. One reason could be that language models might be more confident of their predictions than humans, because they have had exposure to several magnitudes more data. In this paper, we test what effect temperature-scaling of large language model (LLM) predictions has on surprisal estimates and their predictive power of reading times of English texts. Firstly, we show that calibration of large language models typically improves with model size, i.e. poorer calibration cannot account for poorer fit to reading times. Secondly, we find that temperature-scaling probabilities lead to a systematically better fit to reading times (up to 89% improvement in delta log likelihood), across several reading time corpora. Finally, we show that this improvement in fit is chiefly driven by words that are composed of multiple subword tokens.
2016
pdf
bib
abs
Learning to Identify Sentence Parallelism in Student Essays
Wei Song
|
Tong Liu
|
Ruiji Fu
|
Lizhen Liu
|
Hanshi Wang
|
Ting Liu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Parallelism is an important rhetorical device. We propose a machine learning approach for automated sentence parallelism identification in student essays. We build an essay dataset with sentence level parallelism annotated. We derive features by combining generalized word alignment strategies and the alignment measures between word sequences. The experimental results show that sentence parallelism can be effectively identified with a F1 score of 82% at pair-wise level and 72% at parallelism chunk level. Based on this approach, we automatically identify sentence parallelism in more than 2000 student essays and study the correlation between the use of sentence parallelism and the types and quality of essays.
pdf
bib
Understanding Discourse on Work and Job-Related Well-Being in Public Social Media
Tong Liu
|
Christopher Homan
|
Cecilia Ovesdotter Alm
|
Megan Lytle
|
Ann Marie White
|
Henry Kautz
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2014
pdf
bib
Toward Macro-Insights for Suicide Prevention: Analyzing Fine-Grained Distress at Scale
Christopher Homan
|
Ravdeep Johar
|
Tong Liu
|
Megan Lytle
|
Vincent Silenzio
|
Cecilia Ovesdotter Alm
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality