Coreference in Long Documents using Hierarchical Entity Merging

Talika Gupta, Hans Ole Hatzel, Chris Biemann


Abstract
Current top-performing coreference resolution approaches are limited with regard to the maximum length of texts they can accept. We explore a recursive merging technique of entities that allows us to apply coreference models to texts of arbitrary length, as found in many narrative genres. In experiments on established datasets, we quantify the drop in resolution quality caused by this approach. Finally, we use an under-explored resource in the form of a fully coreference-annotated novel to illustrate our model’s performance for long documents in practice. Here, we achieve state-of-the-art performance, outperforming previous systems capable of handling long documents.
Anthology ID:
2024.latechclfl-1.2
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–17
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.2
DOI:
Bibkey:
Cite (ACL):
Talika Gupta, Hans Ole Hatzel, and Chris Biemann. 2024. Coreference in Long Documents using Hierarchical Entity Merging. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 11–17, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Coreference in Long Documents using Hierarchical Entity Merging (Gupta et al., LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.latechclfl-1.2.pdf
Video:
 https://aclanthology.org/2024.latechclfl-1.2.mp4