Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models

Junpeng Li, Zixia Jia, Zilong Zheng


Abstract
Document-level Relation Extraction (DocRE), which aims to extract relations from a long context, is a critical challenge in achieving fine-grained structural comprehension and generating interpretable document representations. Inspired by recent advances in in-context learning capabilities emergent from large language models (LLMs), such as ChatGPT, we aim to design an automated annotation method for DocRE with minimum human effort. Unfortunately, vanilla in-context learning is infeasible for DocRE due to the plenty of predefined fine-grained relation types and the uncontrolled generations of LLMs. To tackle this issue, we propose a method integrating an LLM and a natural language inference (NLI) module to generate relation triples, thereby augmenting document-level relation datasets. We demonstrate the effectiveness of our approach by introducing an enhanced dataset known as DocGNRE, which excels in re-annotating numerous long-tail relation types. We are confident that our method holds the potential for broader applications in domain-specific relation type definitions and offers tangible benefits in advancing generalized language semantic comprehension.
Anthology ID:
2023.emnlp-main.334
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5495–5505
Language:
URL:
https://aclanthology.org/2023.emnlp-main.334
DOI:
10.18653/v1/2023.emnlp-main.334
Bibkey:
Cite (ACL):
Junpeng Li, Zixia Jia, and Zilong Zheng. 2023. Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5495–5505, Singapore. Association for Computational Linguistics.
Cite (Informal):
Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.334.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.334.mp4