CELLO: Causal Evaluation of Large Vision-Language Models

Meiqi Chen, Bo Peng, Yan Zhang, Chaochao Lu


Abstract
Causal reasoning is fundamental to human intelligence and crucial for effective decision-making in real-world environments. Despite recent advancements in large vision-language models (LVLMs), their ability to comprehend causality remains unclear. Previous work typically focuses on commonsense causality between events and/or actions, which is insufficient for applications like embodied agents and lacks the explicitly defined causal graphs required for formal causal reasoning. To overcome these limitations, we introduce a fine-grained and unified definition of causality involving interactions between humans and/or objects. Building on the definition, we construct a novel dataset, CELLO, consisting of 14,094 causal questions across all four levels of causality: discovery, association, intervention, and counterfactual. This dataset surpasses traditional commonsense causality by including explicit causal graphs that detail the interactions between humans and objects. Extensive experiments on CELLO reveal that current LVLMs still struggle with causal reasoning tasks, but they can benefit significantly from our proposed CELLO-CoT, a causally inspired chain-of-thought prompting strategy. Both quantitative and qualitative analyses from this study provide valuable insights for future research. Our project page is at https://github.com/OpenCausaLab/CELLO.
Anthology ID:
2024.emnlp-main.1247
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22353–22374
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1247/
DOI:
10.18653/v1/2024.emnlp-main.1247
Bibkey:
Cite (ACL):
Meiqi Chen, Bo Peng, Yan Zhang, and Chaochao Lu. 2024. CELLO: Causal Evaluation of Large Vision-Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 22353–22374, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
CELLO: Causal Evaluation of Large Vision-Language Models (Chen et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1247.pdf