Peter Zeng
2024
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground
Adil Soubki
|
John Murzaku
|
Arash Yousefi Jordehi
|
Peter Zeng
|
Magdalena Markowska
|
Seyed Abolghasem Mirroshandel
|
Owen Rambow
Findings of the Association for Computational Linguistics ACL 2024
Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM.
2022
Re-Examining FactBank: Predicting the Author’s Presentation of Factuality
John Murzaku
|
Peter Zeng
|
Magdalena Markowska
|
Owen Rambow
Proceedings of the 29th International Conference on Computational Linguistics
We present a corrected version of a subset of the FactBank data set. Previously published results on FactBank are no longer valid. We perform experiments on FactBank using multiple training paradigms, data smoothing techniques, and polarity classifiers. We argue that f-measure is an important alternative evaluation metric for factuality. We provide new state-of-the-art results for four corpora including FactBank. We perform an error analysis on Factbank combined with two similar corpora.
Search
Co-authors
- John Murzaku 2
- Magdalena Markowska 2
- Owen Rambow 2
- Adil Soubki 1
- Arash Yousefi Jordehi 1
- show all...