The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

Caleb Ziems, Jane Yu, Yi-Chia Wang, Alon Halevy, Diyi Yang


Abstract
Conversational agents have come increasingly closer to human competence in open-domain dialogue settings; however, such models can reflect insensitive, hurtful, or entirely incoherent viewpoints that erode a user’s trust in the moral integrity of the system. Moral deviations are difficult to mitigate because moral judgments are not universal, and there may be multiple competing judgments that apply to a situation simultaneously. In this work, we introduce a new resource, not to authoritatively resolve moral ambiguities, but instead to facilitate systematic understanding of the intuitions, values and moral judgments reflected in the utterances of dialogue systems. The Moral Integrity Corpus, MIC, is such a resource, which captures the moral assumptions of 38k prompt-reply pairs, using 99k distinct Rules of Thumb (RoTs). Each RoT reflects a particular moral conviction that can explain why a chatbot’s reply may appear acceptable or problematic. We further organize RoTs with a set of 9 moral and social attributes and benchmark performance for attribute classification. Most importantly, we show that current neural language models can automatically generate new RoTs that reasonably describe previously unseen interactions, but they still struggle with certain scenarios. Our findings suggest that MIC will be a useful resource for understanding and language models’ implicit moral assumptions and flexibly benchmarking the integrity of conversational agents. To download the data, see https://github.com/GT-SALT/mic
Anthology ID:
2022.acl-long.261
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3755–3773
Language:
URL:
https://aclanthology.org/2022.acl-long.261
DOI:
10.18653/v1/2022.acl-long.261
Bibkey:
Cite (ACL):
Caleb Ziems, Jane Yu, Yi-Chia Wang, Alon Halevy, and Diyi Yang. 2022. The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3755–3773, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems (Ziems et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.261.pdf
Video:
 https://aclanthology.org/2022.acl-long.261.mp4
Code
 gt-salt/mic +  additional community code
Data
ETHICS