A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation

Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, Yuexian Hou


Abstract
While extensive work has examined the explicit and implicit biases in large language models (LLMs), little research explores the relation between these two types of biases. This paper presents a comparative study of the explicit and implicit biases in LLMs grounded in social psychology. Social psychology distinguishes between explicit and implicit biases by whether the bias can be self-recognized by individuals. Aligning with this conceptualization, we propose a self-evaluation-based two-stage measurement of explicit and implicit biases within LLMs. First, the LLM is prompted to automatically fill templates with social targets to measure implicit bias toward these targets, where the bias is less likely to be self-recognized by the LLM. Then, the LLM is prompted to self-evaluate the templates filled by itself to measure explicit bias toward the same targets, where the bias is more likely to be self-recognized by the LLM. Experiments conducted on state-of-the-art LLMs reveal human-like inconsistency between explicit and implicit occupational gender biases. This work bridges a critical gap where prior studies concentrate solely on either explicit or implicit bias. We advocate that future work highlight the relation between explicit and implicit biases in LLMs.
Anthology ID:
2024.lrec-main.17
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
186–198
Language:
URL:
https://aclanthology.org/2024.lrec-main.17
DOI:
Bibkey:
Cite (ACL):
Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, and Yuexian Hou. 2024. A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 186–198, Torino, Italia. ELRA and ICCL.
Cite (Informal):
A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation (Zhao et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.17.pdf