The Role of Syntactic Span Preferences in Post-Hoc Explanation Disagreement

Jonathan Kamp, Lisa Beinborn, Antske Fokkens


Abstract
Post-hoc explanation methods for transformer models tend to disagree with one another. Agreement is generally measured for a small subset of most important tokens. However, the presence of disagreement is often overlooked and the reasons for disagreement insufficiently examined, causing these methods to be utilised without adequate care. In this work, we explain disagreement from a linguistic perspective. We find that different methods systematically select different token types. Additionally, similar methods display similar linguistic preferences, which consequently affect agreement. By estimating the subsets of *k* most important tokens dynamically over sentences, we find that methods better agree on the syntactic span level. Especially the methods that agree the least with other methods benefit most from this dynamic subset estimation. We methodically explore the different settings of the dynamic *k* approach: we observe that its combination with spans yields favourable results in capturing important signals in the sentence, and propose an improved setting of global token importance.
Anthology ID:
2024.lrec-main.1397
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16066–16078
Language:
URL:
https://aclanthology.org/2024.lrec-main.1397
DOI:
Bibkey:
Cite (ACL):
Jonathan Kamp, Lisa Beinborn, and Antske Fokkens. 2024. The Role of Syntactic Span Preferences in Post-Hoc Explanation Disagreement. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16066–16078, Torino, Italia. ELRA and ICCL.
Cite (Informal):
The Role of Syntactic Span Preferences in Post-Hoc Explanation Disagreement (Kamp et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1397.pdf